Oxford University Press, Inc. 出版的作品进一步
Oxford University Press, Inc., publishes works that further
牛津大学追求卓越的目标
Oxford University’s objective of excellence
在研究、奖学金和教育方面。
in research, scholarship, and education.
牛津 纽约
Oxford New York
奥克兰 开普敦 达累斯萨拉姆 香港 卡拉奇
Auckland Cape Town Dar es Salaam Hong Kong Karachi
吉隆坡 马德里 墨尔本 墨西哥城 内罗毕
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
新德里 上海 台北 多伦多
New Delhi Shanghai Taipei Toronto
在
With offices in
阿根廷 奥地利 巴西 智利 捷克共和国 法国 希腊
Argentina Austria Brazil Chile Czech Republic France Greece
危地马拉 匈牙利 意大利 日本 波兰 葡萄牙 新加坡
Guatemala Hungary Italy Japan Poland Portugal Singapore
韩国 瑞士 泰国 土耳其 乌克兰 越南
South Korea Switzerland Thailand Turkey Ukraine Vietnam
© 2008 牛津大学出版社版权所有。
Copyright © 2008 by Oxford University Press, Inc.
由牛津大学出版社出版
Published by Oxford University Press, Inc.
198 麦迪逊大街, 纽约, 纽约 10016
198 Madison Avenue, New York, New York 10016
www.oup.com
首次作为牛津大学出版社平装本发行,2010 年
First issued as an Oxford University Press paperback, 2010
Oxford 是牛津大学出版社的注册商标
Oxford is a registered trademark of Oxford University Press
版权所有。不得复制本出版物的任何部分,
All rights reserved. No part of this publication may be reproduced,
存储在检索系统中,或以任何形式或通过任何方式传输,
stored in a retrieval system, or transmitted, in any form or by any means,
电子、机械、影印、录音或其他方式,
electronic, mechanical, photocopying, recording, or otherwise,
未经牛津大学出版社事先许可。
without the prior permission of Oxford University Press.
如需本书中引用的声音和视频示例,请访问www.oup.com/us/patel。
For sound and video examples referenced in this book, visit www.oup.com/us/patel.
美国国会图书馆出版数据编目
Library of Congress Cataloging-in-Publication Data
帕特尔,Aniruddh D.
Patel, Aniruddh D.
音乐、语言和大脑 / Aniruddh D. Patel。
Music, language, and the brain / Aniruddh D. Patel.
p. ; 厘米。
p. ; cm.
包括参考书目。
书号 978-0-19-975530-1
Includes bibliographical references.ISBN 978-0-19-975530-1
1. 音乐——心理方面。2. 音乐——生理方面。3. 听觉——生理
方面。4. 语言习得——生理方面。5. 认知神经科学。6. 神经生物学。
1. Music—Psychological aspects. 2. Music—Physiological aspects. 3. Auditory perception—Physiologicalaspects. 4. Language acquisition—Physiological aspects. 5. Cognitive neuroscience. 6. Neurobiology.
一、标题。
I. Title.
[DNLM:1. 大脑——生理学。2. 音乐—心理学。3. 听觉感知——生理学。
[DNLM: 1. Brain—physiology. 2. Music—psychology. 3. Auditory Perception—physiology.
4. 认知——生理学。5. 语言。WL 300 P295m 2007]
4. Cognition—physiology. 5. Language. WL 300 P295m 2007]
ML3830.P33 2007
ML3830.P33 2007
781'.11—dc22 2007014189
781'.11—dc22 2007014189
9 8 7 6 5 4 3 2 1
9 8 7 6 5 4 3 2 1
美国印制
Printed in the United States of America
在无酸纸上
on acid-free paper
这本书的存在反映了二十世纪两位伟大的自然科学家的支持和远见:爱德华·O·威尔逊和杰拉尔德·M·埃德尔曼。在哈佛大学,威尔逊给了我攻读博士学位的自由。关于人类音乐的生物学,当时这个话题还处于学术腹地。在他的支持下,我的毕业经历从到巴布亚新几内亚的实地考察,到与音乐和大脑方面的专家合作(包括蒙特利尔的伊莎贝尔·佩雷茨等领军人物),再到在马克斯·普朗克心理语言学研究所(Claus Heeschen 所在的研究所)接受培训向我介绍了语言科学的世界)。我在编写本书时与 Wilson 进行了交谈,他的支持帮助启动了该项目。
The existence of this book reflects the support and vision of two of the twentieth century’s great natural scientists: Edward O. Wilson and Gerald M. Edelman. At Harvard University, Wilson gave me the freedom to pursue a Ph.D. on the biology of human music at a time when the topic was still in the academic hinterland. With his support, my graduate experiences ranged from field expeditions to Papua New Guinea, to collaborative work with specialists on music and brain (including leading figures such as Isabelle Peretz in Montreal), to training at the Max Planck Institute for Psycholinguistics (where Claus Heeschen introduced me to the world of language science). I spoke with Wilson while I was formulating this book, and his support helped launch the project. His own scientific writings inspired me to believe in the importance of synthesizing a broad variety of information when exploring an emerging research area.
完成学位后,我有幸被神经科学研究所 (NSI) 聘用,在那里,杰拉尔德·埃德尔曼 (Gerald Edelman) 在 W. Einar Gall 的帮助下创造了一个不寻常的环境,鼓励一小群互动的神经科学家研究关于神经系统的基本问题大脑,将他们的研究朝着这些问题引导的任何方向进行。Edelman 对神经科学及其与人文知识的关系的广阔视野塑造了这个项目和我自己的科学方法。由于 Edelman 和 Gall 对音乐研究的支持,我又一次可以自由地研究各种各样的问题,从人类大脑成像到失语症患者的音乐感知研究,再到泰国大象的击鼓能力。同样重要的是,我受益于非常聪明和才华横溢的同事,他的工作极大地丰富了我的神经科学知识。爱德曼曾说过,科学是为可证实的真理服务的想象力,他创建了一个鼓励想象力的社区,同时将实证研究保持在最高标准。这样的环境对年轻科学家的有益影响怎么估计都不过分。
After completing my degree I was fortunate to be hired at The Neurosciences Institute (NSI), where Gerald Edelman, with the help of W. Einar Gall, has created an unusual environment that encourages a small and interactive group of neuroscientists to pursue basic questions about the brain, taking their research in whatever directions those questions lead. Edelman’s broad vision of neuroscience and of its relationship to humanistic knowledge has shaped this project and my own approach to science. As a consequence of Edelman and Gall’s support of music research, I have once again had the freedom to work on a wide variety of issues, ranging from human brain imaging to the study of music perception in aphasia to the drumming abilities of elephants in Thailand. Equally importantly, I have had the benefit of extremely bright and talented colleagues, whose work has greatly enriched my knowledge of neuroscience. Edelman, who has remarked that science is imagination in the service of the verifiable truth, has created a community where imagination is encouraged while empirical research is held to the highest standard. The salutary effect of such an environment on a young scientist cannot be overestimated.
在 NSI,我也有幸成为 Esther J. Burnham 高级研究员。Esther Burnham 的冒险生活和她对科学和艺术的热情继续激励着我和我的工作。我是还要感谢研究所的上级组织神经科学研究基金会对我工作的持续支持。
At NSI I have also had the privilege to be the Esther J. Burnham Senior Fellow. Esther Burnham’s adventurous life and her passion for both the sciences and the arts continue to be an inspiration for me and my work. I am also grateful to the Neurosciences Research Foundation, the Institute’s parent organization, for its continuing support of my work.
跨越我在哈佛的最后几年和我在 NSI 的最初几年的一个人是 Evan Balaban。Evan 向我介绍了听觉神经科学的世界,以及跨学科科学研究的创造性和批判性方法。另一个特别亲密的同事是约翰·艾弗森 (John Iversen),他是一位富有洞察力的科学家,我很高兴与他每天共事。
One person who spanned my last few years at Harvard and my first few years at NSI was Evan Balaban. Evan introduced me to the world of auditory neuroscience and to a creative and critical approach to interdisciplinary scientific research. Another person who has been a particularly close colleague is John Iversen, a deeply insightful scientist with whom I’m pleased to work on a daily basis.
本书得益于众多杰出学者的专业知识和建设性批评。John Sloboda、Carol Krumhansl 和 D. Robert Ladd 阅读了整篇手稿并提供了在重要方面改进本书的评论。我还要感谢 Bruno Repp、John Iversen、已故的 Peter Ladefoged、Jeff Elman、Sun-Ah Jun、Emmanuel Bigand、Stephen Davies、Bob Slevc、Erin Hannon、Lauren Stewart、Sarah Hawkins、Florian Jaeger、Benjamin Carson 和 Amy感谢 Schafer 对个别章节的深刻评论,感谢 Oliver Sacks 与我分享他对音乐和大脑的观察和雄辩的著作。Heidi Moomaw 在组织索引和参考资料方面提供了宝贵的帮助。我还要感谢牛津大学出版社的 Joan Bossert 从一开始就对这个项目的积极投入。
This book has benefited from the expertise and constructive criticism of numerous outstanding scholars. John Sloboda, Carol Krumhansl, and D. Robert Ladd read the entire manuscript and provided comments that improved the book in important ways. I am also grateful to Bruno Repp, John Iversen, the late Peter Ladefoged, Jeff Elman, Sun-Ah Jun, Emmanuel Bigand, Stephen Davies, Bob Slevc, Erin Hannon, Lauren Stewart, Sarah Hawkins, Florian Jaeger, Benjamin Carson, and Amy Schafer for insightful comments on individual chapters, and to Oliver Sacks for sharing with me his observations and eloquent writings on music and the brain. Heidi Moomaw provided valuable assistance in organizing the index and references. I would also like to thank Joan Bossert at Oxford University Press for her energetic commitment to this project from its outset. Joan and Abby Gross in the editorial department and Christi Stanforth in production have been an outstanding publishing team.
就个人而言,我的家人对完成这个项目至关重要。我的母亲 Jyotsna Pandit Patel 一直是鼓励和灵感的源泉。Kiran 和 Neelima Pandit、我已故的祖母 Indira Pandit 以及 Shirish 和 Rajani Patel 以重要的方式给予了他们的支持。Jennifer Burton 从一开始就是这个项目的关键部分,作为一名作家和学者,作为我的妻子,作为我们两个孩子 Roger 和 Lilia Burtonpatel 的母亲。
On a personal note, my family has been essential in bringing this project to completion. My mother, Jyotsna Pandit Patel, has been an unending source of encouragement and inspiration. Kiran and Neelima Pandit, my late grandmother Indira Pandit, and Shirish and Rajani Patel have given their support in essential ways. Jennifer Burton has been a key part of this project from the beginning, as a fellow writer and scholar, as my wife, and as the mother of our two children, Roger and Lilia Burtonpatel.
正如我在本序言开头所暗示的那样,过去十年见证了音乐和大脑研究的转变。研究人员的松散联系已经发展成为一个充满活力的研究团体,其人数每年都在稳步增长。在这个成长时期,我很幸运能成为这个社区的一员。如果这本书能为我们这个领域日益增长的兴奋和前景做出贡献,那么我就已经完成了我的目标。
As I intimated at the start of this preface, the past decade has seen a transformation in the study of music and the brain. A loose affiliation of researchers has developed into a dynamic research community whose numbers are growing steadily each year. I have been fortunate to be part of this community during this formative time. If this book can contribute to the growing excitement and promise of our field, I will have more than accomplished my goal.
Chapter 1Introduction
Chapter 2
Sound Elements: Pitch and Timbre
2.1 Introduction
2.2 Musical Sound Systems
2.3 Linguistic Sound Systems
2.4 Sound Category Learning as a Key Link
2.5 Conclusion
Appendixes
Chapter 3
Rhythm
3.1 Introduction
3.2 Rhythm in Music
3.3 Rhythm in Speech
3.4 Interlude: Rhythm in Poetry and Song
3.5 Nonperiodic Aspects of Rhythm as a Key Link
3.6 Conclusion
Appendixes
Chapter 4
Melody
4.1 Introduction
4.2 Melody in Music: Comparisons to Speech
4.3 Speech Melody: Links to Music
4.4 Interlude: Musical and Linguistic Melody in Song
4.5 Melodic Statistics and Melodic Contour as Key Links
4.6 Conclusion
Appendix
Chapter 5
Syntax
5.1 Introduction
5.2 The Structural Richness of Musical Syntax
5.3 Formal Differences and Similarities Between Musical and Linguistic Syntax
5.4 Neural Resources for Syntactic Integration as a Key Link
5.5 Conclusion
Chapter 6
Meaning
6.1 Introduction
6.2 A Brief Taxonomy of Musical Meaning
6.3 Linguistic Meaning in Relation to Music
6.4 Interlude: Linguistic and Musical Meaning in Song
6.5 The Expression and Appraisal of Emotion as a Key Link
6.6 Conclusion
Chapter 7
Evolution
7.1 Introduction
7.2 Language and Natural Selection
7.3 Music and Natural Selection
7.4 Music and Evolution: Neither Adaptation nor Frill
7.5 Beat-Based Rhythm Processing as a Key Research Area
7.6 Conclusion
Appendix
Afterword
References
List of Sound Examples
List of Credits
Author Index
Subject Index
语言和音乐将我们定义为人。这些特征出现在每个人类社会中,无论文化的其他方面是否存在(Nettl,2000)。例如,考虑一下巴西亚马逊地区的一个小部落皮拉罕 (Pirahã)。这种文化的成员说的是一种没有数字或计数概念的语言。他们的语言没有固定的颜色术语。除了简单的简笔画外,他们没有创世神话,也不会画画。然而,他们以歌曲的形式拥有丰富的音乐(埃弗雷特,2005 年)。
Language and music define us as human. These traits appear in every human society, no matter what other aspects of culture are absent (Nettl, 2000). Consider, for example, the Pirahã, a small tribe from the Brazilian Amazon. Members of this culture speak a language without numbers or a concept of counting. Their language has no fixed terms for colors. They have no creation myths, and they do not draw, aside from simple stick figures. Yet they have music in abundance, in the form of songs (Everett, 2005).
音乐和语言在人类生存中的核心作用,以及两者都涉及复杂而有意义的声音序列这一事实,自然而然地需要对这两个领域进行比较。然而,从现代认知科学的角度来看,音乐与语言的关系才刚刚开始被探索。随着来自不同领域的研究人员越来越多地被这个跨学科的企业所吸引,这种情况似乎有望迅速改变。这种研究的吸引力很容易理解。人类在理解声音方面的能力是无与伦比的。在我们经验的许多其他分支(例如,视觉感知、触觉)中,我们可以从研究其他动物的行为和大脑中学到很多东西,因为我们的经验与他们的经验并无太大不同。然而,当谈到语言和音乐时,我们这个物种是独一无二的(cf.第 7 章,关于进化)。这使得通过将人类与其他生物进行比较,很难深入了解语言或音乐作为一种认知系统。然而在我们自己的脑海中有两个系统执行非常相似的解释壮举,将复杂的声学序列转换为感知上离散的元素(例如单词或和弦),组织成层次结构,传达丰富的含义。这为认知科学提供了一个特殊的机会。具体来说,探索音乐和语言之间的相似点和不同点可以加深我们对构成我们物种独特而强大的交流能力基础的机制的理解。
The central role of music and language in human existence and the fact that both involve complex and meaningful sound sequences naturally invite comparison between the two domains. Yet from the standpoint of modern cognitive science, music-language relations have barely begun to be explored. This situation appears to be poised to change rapidly, as researchers from diverse fields are increasingly drawn to this interdisciplinary enterprise. The appeal of such research is easy to understand. Humans are unparalleled in their ability to make sense out of sound. In many other branches of our experience (e.g., visual perception, touch), we can learn much from studying the behavior and brains of other animals because our experience is not that different from theirs. When it comes to language and music, however, our species is unique (cf. Chapter 7, on evolution). This makes it difficult to gain insight into language or music as a cognitive system by comparing humans to other organisms. Yet within our own minds are two systems that perform remarkably similar interpretive feats, converting complex acoustic sequences into perceptually discrete elements (such as words or chords) organized into hierarchical structures that convey rich meanings. This provides a special opportunity for cognitive science. Specifically, exploring both the similarities and the differences between music and language can deepen our understanding of the mechanisms that underlie our species’ uniquely powerful communicative abilities.
当然,对音乐与语言关系的兴趣并非源于现代认知科学。这个话题长期以来引起了广泛的思想家的兴趣,包括哲学家、生物学家、诗人、作曲家、语言学家和音乐学家。2000 多年前,柏拉图声称某些事物的力量提升精神的音乐模式源于它们与高贵演讲的声音的相似性(Neubauer,1986)。很久以后,达尔文 (1871) 考虑了一种介于现代语言和音乐之间的交流形式可能是我们物种交流能力的起源。许多其他历史人物都曾考虑过音乐与语言的关系,包括文森佐·伽利莱(伽利略之父)、让-雅克·卢梭和路德维希·维特根斯坦。这种长期的思辨思维一直延续到现代(例如,伯恩斯坦,1976 年)。然而,在认知科学时代,对这一主题的研究正在发生巨大的转变,使用新的概念和工具从建议和类比推进到实证研究。
Of course, interest in music-language relations does not originate with modern cognitive science. The topic has long drawn interest from a wide range of thinkers, including philosophers, biologists, poets, composers, linguists, and musicologists. Over 2,000 years ago, Plato claimed that the power of certain musical modes to uplift the spirit stemmed from their resemblance to the sounds of noble speech (Neubauer, 1986). Much later, Darwin (1871) considered how a form of communication intermediate between modern language and music may have been the origin of our species’ communicative abilities. Many other historical figures have contemplated music-language relations, including Vincenzo Galilei (father of Galileo), Jean-Jacques Rousseau, and Ludwig Wittgenstein. This long line of speculative thinking has continued down to the modern era (e.g., Bernstein, 1976). In the era of cognitive science, however, research into this topic is undergoing a dramatic shift, using new concepts and tools to advance from suggestions and analogies to empirical research.
推动比较研究的部分原因是两种观点之间的紧张关系:一种强调音乐和语言之间的差异,另一种寻求共同点。当然,重要的区别确实存在。举几个例子,音乐以语音所没有的方式组织音高和节奏,并且缺乏语言在语义方面的特殊性。语言语法是根据音乐中不存在的类别(例如名词和动词)构建的,而音乐似乎比普通语言对我们的情感具有更深的影响力。此外,神经心理学在记录脑损伤或脑异常损害一个领域但不影响另一个领域(例如,失忆症和失语症)的案例方面有着悠久的历史。
Part of what animates comparative research is a tension between two perspectives: one that emphasizes the differences between music and language, and one that seeks commonalities. Important differences do, of course, exist. To take a few examples, music organizes pitch and rhythm in ways that speech does not, and lacks the specificity of language in terms of semantic meaning. Language grammar is built from categories that are absent in music (such as nouns and verbs), whereas music appears to have much deeper power over our emotions than does ordinary speech. Furthermore, there is a long history in neuropsychology of documenting cases in which brain damage or brain abnormality impairs one domain but spares the other (e.g., amusia and aphasia). Considerations such as these have led to the suggestion that music and language have minimal cognitive overlap (e.g., Marin & Perry, 1999; Peretz, 2006).
本书提倡另一种观点,强调共性而非差异。这种观点认为,这两个领域虽然有专门的表示(例如音乐中的音高音程,语言中的名词和动词),但共享许多基本的处理机制,音乐和语言的比较研究提供了一种强有力的方法来探索这些机制。这些机制包括形成学习声音类别的能力(第 2 章),从节奏和旋律序列中提取统计规律(第 3 和 4 章),将传入元素(例如单词和音乐音调)整合到句法结构中(第 5 章) ,并从声音信号中提取细微的情感含义(第 6 章). 支持这一观点的证据来自认知科学和神经科学的不同研究领域,这些研究领域迄今尚未统一在一个共同的框架中。本书的最后一章(第 7 章)采用了进化的观点,并使用音乐语言比较来解决音乐是否是一种进化适应这一持续存在的问题。
This book promotes the alternative perspective, which emphasizes commonalities over differences. This perspective claims that these two domains, although having specialized representations (such as pitch intervals in music, and nouns and verbs in language), share a number of basic processing mechanisms, and that the comparative study of music and language provides a powerful way to explore these mechanisms. These mechanisms include the ability to form learned sound categories (Chapter 2), to extract statistical regularities from rhythmic and melodic sequences (Chapters 3 and 4), to integrate incoming elements (such as words and musical tones) into syntactic structures (Chapter 5), and to extract nuanced emotional meanings from acoustic signals (Chapter 6). The evidence supporting this perspective comes from diverse strands of research within cognitive science and neuroscience, strands that heretofore have not been unified in a common framework. The final chapter of the book (Chapter 7) takes an evolutionary perspective, and uses music-language comparisons to address the persistent question of whether music is an evolutionary adaptation.
整本书的重点是普通口语和纯器乐之间的关系。值得解释这种选择的动机,因为最初将器乐与诗歌等艺术语言形式进行比较,或者专注于音乐和语言交织的声乐似乎更合适。基本动机是一种认知动机:乐器的制作和感知在多大程度上音乐利用了我们日常交流系统中使用的认知和神经机制?将普通语言与器乐进行比较,迫使我们寻找将明显不同的现象统一起来的隐藏联系。
Throughout the book the focus is on the relationship between ordinary spoken language and purely instrumental music. It is worth explaining the motivation for this choice, because initially it may seem more appropriate to compare instrumental music to an artistic form of language such as poetry, or to focus on vocal music, where music and language intertwine. The basic motivation is a cognitive one: To what extent does the making and perceiving of instrumental music draw on cognitive and neural mechanisms used in our everyday communication system? Comparing ordinary language to instrumental music forces us to search for the hidden connections that unify obviously different phenomena.
当然,在采用这种方法时,至关重要的是要从虚假的联系中筛选出真实的联系,以避免音乐和语言之间肤浅的类比分散注意力。这需要对音乐和语言系统的结构有扎实的了解。因此,每一章都详细讨论了与本章主题相关的音乐和/或语言结构,例如节奏或旋律。这些部分(每章的第 2 节和第 3 节)为本章的最后一节提供了背景,该节探讨了音乐和语言之间的关键认知联系,提供了实证证据并为未来的研究指明了方向。
In taking such an approach, of course, it is vital to sift real connections from spurious ones, to avoid the distraction of superficial analogies between music and language. This requires a solid understanding of the structure of musical and linguistic systems. In consequence, each chapter discusses in detail the structure of music and/or language with regard to the chapter’s topic, for example, rhythm or melody. These sections (sections 2 and 3 of each chapter) provide the context for the final section of the chapter, which explores a key cognitive link between music and language, providing empirical evidence and pointing the way to future studies.
由于来自不同领域的研究人员对音乐与语言的关系很感兴趣,因此本书旨在供受过音乐或语言研究初级培训的个人阅读。此外,每一章都可以独立存在,以供主要对选定主题(例如,句法、进化)感兴趣的读者阅读。因为每一章都涵盖了大量的材料,所以每一章中都有一个部门组织。每章开头还有一个详细的目录,可以作为阅读时的路线图。
Because music-language relations are of interest to researchers from diverse fields, this book is written to be accessible to individuals with primary training in either in music or language studies. In addition, each chapter can stand largely on its own, for readers who are primarily interested in a selected topic (e.g., syntax, evolution). Because each chapter covers a good deal of material, there is a sectional organization within each chapter. Each chapter also begins with a detailed table of contents, which can be used as a roadmap when reading.
我希望这本书能为那些有兴趣从认知角度探索音乐与语言关系的人提供一个框架。无论一个人的理论观点偏向于寻找差异还是共性,有一点是肯定的:比较方法正在开辟全新的研究途径,而我们的旅程才刚刚开始。
My hope is that this book will help provide a framework for those interested in exploring music-language relations from a cognitive perspective. Whether one’s theoretical perspective favors the search for differences or commonalities, one thing is certain: The comparative approach is opening up entirely new avenues of research, and we have just started the journey.
Chapter 2Sound Elements
2.1 Introduction
2.2 Musical Sound Systems
2.2.1 Introduction to Musical Sound Systems
2.2.2 Pitch Contrasts in Music
Introduction to Musical Scales
Cultural Diversity and Commonality in Musical Scale Systems
2.2.3 Pitch Intervals as Learned Sound Categories in Music
Pitch Intervals and a Perceptual Illusion
Pitch Intervals and Melody Perception
Pitch Intervals and Categorical Perception
Pitch Intervals and Neuroscience
2.2.4 Timbral Contrasts in Music
The Rarity of Timbral Contrasts as a Basis for Musical Sound Systems
Example of a Timbre-Based Musical System
2.3 Linguistic Sound Systems
2.3.1 Introduction to Linguistic Sound Systems
2.3.2 Pitch Contrasts in Language
Pitch Contrasts Between Level Tones in Language: General Features
A Closer Look at Pitch Contrasts Between Level Tones in Tone Languages
Absolute Pitch in Speech?
Mapping Linguistic Tone Contrasts Onto Musical Instruments
2.3.3 Timbral Contrasts in Language
Timbral Contrasts in Language: Overview
Timbral Contrasts Among Vowels
Spectrum, Timbre, and Phoneme
A Brief Introduction to the Spectrogram
Mapping Linguistic Timbral Contrasts Onto Musical Sounds
2.3.4 Consonants and Vowels as Learned Sound Categories in Language
Consonants: Perception of Nonnative Contrasts
Vowels: Brain Responses to Native Versus Nonnative Contrasts
2.4 Sound Category Learning as a Key Link
2.4.1 Background for Comparative Studies
Dissociations
Hemispheric Asymmetries
A “Speech Mode” of Perception
Summary of Background for Comparative Studies
2.4.2 Relations Between Musical Ability and Linguistic Phonological Abilities
2.4.3 Sound Category-Based Distortions of Auditory Perception
2.4.4 Decay in Sensitivity to Nonnative Sound Categories
2.4.5 Exploring a Common Mechanism for Sound Category Learning
2.5 Conclusion
Appendixes
A.1 Some Notes on Pitch
A.2 Semitone Equations
A.3 Theories for the Special Perceptual Qualities of Different Pitch Intervals
A.3.1 Sensory Consonance and Dissonance as the Basis for Musical Intervals
A.3.2 Pitch Relationships as the Basis for Musical Intervals
A.3.3 The Overtone Series as the Basis for Musical Intervals
A.3.4 Future Directions in Research on the Basis of Musical Intervals
A.4 Lexical Pitch Scaling as a Proportion of the Current Range
每个人类婴儿都出生在一个拥有两个不同声音系统的世界中。第一个是语言方面的,包括母语的元音、辅音和音调对比。第二个是音乐性的,包括文化音乐的音色和音调。即使没有明确的指导,大多数婴儿也会成长为精通母语并喜欢其文化音乐的成年人。然而,这些特性是有代价的。一种语言的技能可能导致难以听到或产生另一种语言的某些声音差异,来自一种文化的音乐爱好者可能会发现另一种文化的音乐跑调和烦人。1个为什么会这样?答案很简单,我们的原生音响系统会在我们的脑海中留下印记。也就是说,学习一个声音系统会导致我们的母语或音乐的声音类别的心理框架。这个框架帮助我们从富含声学变化的物理信号中提取独特的单元。这样的框架在我们本土的声音环境中具有高度的适应性,但在听到另一种文化的语言或音乐时可能是不利的,因为我们“听口音”是基于我们本土的声音系统。
Every human infant is born into a world with two distinct sound systems. The first is linguistic and includes the vowels, consonants, and pitch contrasts of the native language. The second is musical and includes the timbres and pitches of the culture’s music. Even without explicit instruction, most infants develop into adults who are proficient in their native language and who enjoy their culture’s music. These traits come at a price, however; skill in one language can result in difficulty in hearing or producing certain sound distinctions in another, and a music lover from one culture may find another culture’s music out of tune and annoying.1 Why is this so? The simple answer is that our native sound system leaves an imprint on our minds. That is, learning a sound system leads to a mental framework of sound categories for our native language or music. This framework helps us extract distinctive units from physical signals rich in acoustic variation. Such frameworks are highly adaptive in our native sonic milieu, but can be liabilities when hearing another culture’s language or music, because we “hear with an accent” based on our native sound system.
当然,音乐和语音在它们的声音分类系统上有一个非常明显的区别。虽然音高是音乐中声音类别(例如音程和和弦)的主要基础,但音色是语音类别(例如元音和辅音)的主要基础声音。本章根据音高和音色的组织方式比较音乐和演讲。第 2.2 节侧重于音乐,重点是音阶和音高音程作为音乐中学习的声音类别。本节还讨论了音乐音色,并探讨了为什么音色对比很少成为音乐音响系统的基础。第 2.3 节然后转向语言,并讨论语言音调和音色组织与音乐的比较。正如我们将看到的,以这种方式比较“喜欢”(例如,比较音乐和语言的音调系统,或音乐和语言的音色系统)突出了音乐和语言之间的差异。然而,如果人们关注声音分类的认知过程,那么相似性就会开始出现。事实上,越来越多的证据表明,语音和音乐共享声音类别学习机制,尽管这两个领域从不同的声音特征构建它们的主要声音类别。支持这一想法的实证比较工作将在第 2.4 节中回顾。这项工作的含义是,虽然最终产品尽管音乐和语音中的声音类别学习非常不同(例如,音高间隔与辅音的心理表征),但创建声音类别的过程有很大程度的重叠。
Of course, music and speech have one very obvious difference in their sound category systems. Although pitch is the primary basis for sound categories in music (such as intervals and chords), timbre is the primary basis sound for categories of speech (e.g., vowels and consonants). This chapter compares music and speech in terms of the way they organize pitch and timbre. Section 2.2 focuses on music, with an emphasis on musical scales and on pitch intervals as learned sound categories in music. This section also discusses musical timbre, and explores why timbral contrasts are rarely the basis for musical sound systems. Section 2.3 then turns to language, and discusses how linguistic pitch and timbral organization compare to music. As we shall see, comparing “like to like” in this way (e.g., comparing musical and linguistic pitch systems, or musical and linguistic timbral systems) highlights the differences between music and speech. If, however, one focuses on cognitive processes of sound categorization, then similarities begin to emerge. In fact, there is growing evidence that speech and music share mechanisms for sound category learning, even though the two domains build their primary sound categories from different features of sound. The empirical comparative work supporting this idea will be reviewed in section 2.4. The implication of this work is that although the end products of sound category learning in music and speech are quite different (e.g., mental representations of pitch intervals vs. consonants), the processes that create sound categories have an important degree of overlap.
在开始比较语音和音乐之前,有必要退后一步,从更广泛的生物学角度来审视这些声音系统。是什么,如果有的话,将它们与其他动物使用的多种多样的声音通信系统区分开来?人们经常注意到,语音和音乐是“粒子”系统,其中一组几乎没有内在意义的离散元素(例如音调或音素)组合在一起形成具有多种意义的结构(Hockett&Altmann,1968;默克尔,2002 年)。此属性将语音和音乐与许多动物使用的整体声音系统区分开来,在整体声音系统中,每个声音都与特定的含义相关联,但声音不会重新组合以形成新的含义(这种系统的一个著名例子是 Vervet 猴子,Cercopithecus aethiops,一种非洲灵长类动物,对不同的捕食者发出明显的警报;切尼和赛法斯,1982 年)。
Before embarking on a comparison of speech and music, it is worth stepping back and viewing these sound systems in a broader biological perspective. What, if anything, distinguishes them from the great diversity of acoustic communication systems used by other animals? It is often noted that speech and music are “particulate” systems, in which a set of discrete elements of little inherent meaning (such as tones or phonemes) are combined to form structures with a great diversity of meanings (Hockett & Altmann, 1968; Merker, 2002). This property distinguishes speech and music from the holistic sound systems used by many animals, in which each sound is associated with a particular meaning but sounds are not recombined to form new meanings (a celebrated example of such a system is that of the Vervet monkey, Cercopithecus aethiops, an African primate with distinct alarm calls for different predators; Cheney & Seyfarth, 1982).
尽管如此,有人可能会争辩说,微粒声音系统并不是人类独有的。例如,雄性座头鲸会唱复杂的歌曲,这些歌曲由组织成短语和主题的离散元素组成。此外,一个群体中的个体会聚集在一首非常相似的歌曲中,这首歌在每个繁殖季节都会逐渐改变,这提供了学习元素及其模式的证据(Payne,2000;参见 Noad 等人,2000)。然而,至关重要的是,没有证据表明元素的顺序与歌曲的含义之间存在丰富的关系。相反,这些歌曲似乎总是表达同样的意思,换句话说,是对女性的性广告和男性间支配地位展示的结合(Tyack & Clark, 2000)。关于鸟鸣也有类似的观点,它可以具有颗粒结构,离散元素重新组合形成新序列(例如,在反舌鸟中)。然而,尽管有这种结构特征,歌曲似乎总是传达相同的一小部分含义,包括准备交配、领地展示 (Marler, 2000),以及在某些情况下,个人的身份 (Gentner & Hulse, 1998)。
One might nevertheless argue that particulate sound systems are not unique to humans. Male humpback whales, for example, sing complex songs consisting of discrete elements organized into phrases and themes. Furthermore, individuals within a group converge on a very similar song, which changes incrementally throughout each breeding season, providing evidence that the elements and their patterning are learned (Payne, 2000; cf. Noad et al., 2000). Crucially, however, there is no evidence for a rich relationship between the order of elements and the meaning of the song. Instead, the songs always seem to mean the same thing, in other words, a combination of a sexual advertisement to females and an intermale dominance display (Tyack & Clark, 2000). A similar point has been made about bird songs, which can have a particulate structure with discrete elements recombined to form novel sequences (e.g., in mockingbirds). Despite this structural feature, however, the songs appear to always convey the same small set of meanings, including readiness to mate, territorial display (Marler, 2000), and in some cases, an individual’s identity (Gentner & Hulse, 1998).
因此,语音和音乐的微粒性质在生物声音系统中是独一无二的。然而,这一事实本身并不能作为语言和音乐在认知处理方面存在某些深刻共性的证据。相反,两者都可以独立成为微粒系统,因为这样的系统可以很好地解决某一类问题:即如何以经济的方式传达广泛的含义。例如,DNA 肯定不是人类思维的产物,它是一个微粒系统,它使用一组有限的离散元素传递大量不同的意义(遗传信息):四种化学碱基腺嘌呤、胞嘧啶、鸟嘌呤和胸腺嘧啶。因此,口语和音乐声音系统之间是否存在任何重要的认知相似性是一个需要实证研究的问题。
Thus the particulate nature of speech and music is unique among biological sound systems. This fact alone, however, cannot be taken as evidence for some deep commonality between speech and music in terms of cognitive processing. Rather, both could have independently become particulate systems because such a system is a good solution to a certain kind of problem: namely, how to communicate a wide range of meanings in an economical way. For example, DNA, which is certainly not a product of the human mind, is a particulate system that transmits a great diversity of meanings (genetic information) using a finite set of discrete elements: the four chemical bases adenine, cytosine, guanine, and thymine. Thus, whether any significant cognitive similarities exist between spoken and musical sound systems is a question that requires empirical investigation.
怀疑可能存在这种相似性的一个原因是,一方面是语音和音乐之间的重要差异,另一方面是 DNA 的微粒系统。DNA 链中的每个化学碱基都具有不变的物理结构。相比之下,任何给定的口语或音乐声音系统的构建块(例如特定的元音或音乐音高间隔)在物理结构上可能因标记而异,并且作为上下文的函数(Burns,1999;Stevens,1998)。头脑必须找到某种方法来应对这种可变性,将类别内的变异与构成类别变化的变异区分开来。此外,声音和类别之间的映射取决于母语或音乐。一个著名的语言例子涉及英语音素 /l/ 和 /r/。虽然对于说英语的人来说,这是两种截然不同的声音类别,但对于说日语的人来说,这些声音只是同一语音的两个版本,很难在声学上加以区分 (Iverson et al., 2003)。与音乐类似,西欧音乐中大三度和小三度的旋律音高间隔之间的区别可能与新几内亚某些文化的音乐无关,这些文化被视为单一声音类别的变体(Chenoweth,1980)。这两个例子都说明了语音和音乐的“粒子”不是像化学基础那样的物理实体:它们是从学习的声音类别的心理框架中衍生出来的心理实体。对于说日语的人来说,这些声音只是同一语音的两个版本,很难在声学上加以区分(Iverson 等人,2003 年)。与音乐类似,西欧音乐中大三度和小三度的旋律音高间隔之间的区别可能与新几内亚某些文化的音乐无关,这些文化被视为单一声音类别的变体(Chenoweth,1980)。这两个例子都说明了语音和音乐的“粒子”不是像化学基础那样的物理实体:它们是从学习的声音类别的心理框架中衍生出来的心理实体。对于说日语的人来说,这些声音只是同一语音的两个版本,很难在声学上加以区分(Iverson 等人,2003 年)。与音乐类似,西欧音乐中大三度和小三度的旋律音高间隔之间的区别可能与新几内亚某些文化的音乐无关,这些文化被视为单一声音类别的变体(Chenoweth,1980)。这两个例子都说明了语音和音乐的“粒子”不是像化学基础那样的物理实体:它们是从学习的声音类别的心理框架中衍生出来的心理实体。西欧音乐中大三度和小三度的旋律音高间隔之间的区别可能与新几内亚某些文化的音乐无关,这些文化被视为单一声音类别的变体(Chenoweth,1980)。这两个例子都说明了语音和音乐的“粒子”不是像化学基础那样的物理实体:它们是从学习的声音类别的心理框架中衍生出来的心理实体。西欧音乐中大三度和小三度的旋律音高间隔之间的区别可能与新几内亚某些文化的音乐无关,这些文化被视为单一声音类别的变体(Chenoweth,1980)。这两个例子都说明了语音和音乐的“粒子”不是像化学基础那样的物理实体:它们是从学习的声音类别的心理框架中衍生出来的心理实体。
One reason to suspect that there may be such similarities concerns an important difference between speech and music on the one hand and the particulate system of DNA on the other. Each chemical base in a DNA strand has an invariant physical structure. In contrast, any given building block of a spoken or musical sound system (such as a particular vowel or musical pitch interval) may vary in physical structure from token to token and as a function of context (Burns, 1999; Stevens, 1998). The mind must find some way to cope with this variability, separating variation within a category from variation that constitutes a change in category. Furthermore, the mapping between sounds and categories depends on the native language or music. One well-known example from language concerns the English phonemes /l/ and /r/. Although it may seem obvious that these are two distinct sound categories to an English speaker, to a Japanese speaker these sounds are merely two versions of the same speech sound and can be quite difficult to discriminate acoustically (Iverson et al., 2003). Analogously in music, the distinction between melodic pitch intervals of a major and minor third in Western European music may be irrelevant for the music of some cultures from New Guinea, where these are treated as variants of a single sound category (Chenoweth, 1980). Both of these examples illustrate that the “particles” of speech and music are not physical entities like a chemical base: They are psychological entities derived from a mental framework of learned sound categories.
19 世纪和 20 世纪见证了对人类音乐系统多样性的研究的极大扩展。(有关易于理解的介绍,请参阅 Pantaleoni,1985 年;Reck,1997 年;和 Titon,1996 年。)这项研究得出了一个明确的结论:音乐中几乎没有普遍性(Nettl,2000 年)。的确,如果“音乐”被定义为“按时间组织的声音,旨在用于或被视为审美体验”(Rodriguez,1995 年,引自 Dowling,2001:470),“通用”被定义为出现在每个音乐系统中的特征, 很明显,除了音乐必须以某种方式涉及声音这一微不足道的普遍性之外,音乐中没有声音普遍性。例如,很可能有一段现代电子音乐没有音调变化,也没有明显的节奏模式,由一系列以音色细微差异区分的噪音组成。只要作曲家和听众考虑到这种音乐,它就是音乐。或者想想约翰·凯奇 (John Cage) 的著名作品 4'33'',其中一位音乐家只是坐在钢琴前什么都不做,而观众则静静地聆听。在这种情况下,音乐是环境的环境声音(例如观众的呼吸声或过往汽车的喇叭声),通过观众的意图过滤,以感知他们在审美框架中听到的任何内容。这些案例说明,某些人认为的音乐与其他人认为的音乐几乎没有任何共同之处。
The 19th and 20th centuries saw a great expansion of research on the diversity of human musical systems. (For accessible introductions, see Pantaleoni, 1985; Reck, 1997; and Titon, 1996.) This research has led to one clear conclusion: There are very few universals in music (Nettl, 2000). Indeed, if “music” is defined as “sound organized in time, intended for, or perceived as, aesthetic experience” (Rodriguez, 1995, cited in Dowling, 2001:470) and “universal” is defined as a feature that appears in every musical system, it is quite clear that there are no sonic universals in music, other than the trivial one that music must involve sound in some way. For example, it is quite possible to have a modern piece of electronic music with no variation in pitch and no salient rhythmic patterning, consisting of a succession of noises distinguished by subtle differences in timbre. As long as the composer and audience consider this music, then it is music. Or consider John Cage’s famous piece 4′33′′, in which a musician simply sits in front of a piano and does nothing while the audience listens in silence. In this case, the music is the ambient sound of the environment (such as the breathing sounds of the audience, or the horn of a passing car) filtered through the audience’s intent to perceive whatever they hear in an aesthetic frame. These cases serve to illustrate that what is considered music by some individuals may have virtually nothing in common with what counts as music for others.
然而,如果我们将注意力仅限于在其本土文化中广泛传播的音乐系统,我们就会开始看到某些反复出现的模式。例如,人类音乐的两个共同特性是使用有组织的音高对比系统和音色的重要性。以下两节概述了音乐中的音高和音色对比。在检查音高对比时,我关注音程而不是和弦,因为前者在音乐系统中比后者更普遍。
Nevertheless, if we restrict our attention to musical systems that are widely disseminated in their native culture, we begin to see certain patterns that emerge repeatedly. For example, two common properties of human music are the use of an organized system of pitch contrasts and the importance of musical timbre. The following two sections give an overview of pitch and timbral contrasts in music. When examining pitch contrasts, I focus on intervals rather than chords because the former are much more widespread in musical systems than are the latter.
音高是声音最显着的感知方面之一,定义为“声音的特性,使其能够按从低到高的音阶排列”(美国声学协会标准声学术语,cf. Randel,1978 年) ). 2音高的物理相关性是频率(以每秒周期数或赫兹为单位),在恒定频率的纯音中,音高基本上等于音调的频率(那些想要进一步介绍音高的人可以参考本章的附录1 ). 当然,除了音高之外,每种声音都有几个感知方面,特别是响度、长度、音色和位置。这些属性中的每一个都可以独立于另一个而变化,并且人类的思维能够根据任何事物区分几个类别这些维度 (Miller, 1956)。然而,音高是创建有组织的音乐元素系统的最常见维度。例如,所有文化都有某种形式的歌曲,而歌曲几乎总是具有稳定的音高对比系统。为什么音高是构建音乐系统的一个特殊维度?为什么我们找不到许多文化,在这些文化中,响度是音乐对比的基础,而音调变化是偶然的,甚至是被抑制的?在这种假设的(但完全可能的)文化中,音乐的音高变化最小,但响度会在几个不同的水平上变化,可能是音符之间的变化,这种变化将成为结构组织和审美反应的基础。
Pitch is one of the most salient perceptual aspects of a sound, defined as “that property of a sound that enables it to be ordered on a scale going from low to high” (Acoustical Society of America Standard Acoustical Terminology, cf. Randel, 1978).2 The physical correlate of pitch is frequency (in cycles per second, or Hertz), and in a constant-frequency pure tone the pitch is essentially equal to the frequency of the tone (those desiring further introductory material on pitch may consult this chapter’s appendix 1). Of course, every sound has several perceptual aspects other than pitch, notably loudness, length, timbre, and location. Each of these properties can vary independently of the other, and the human mind is capable of distinguishing several categories along any of these dimensions (Miller, 1956). Yet pitch is the most common dimension for creating an organized system of musical elements. For example, all cultures have some form of song, and songs almost always feature a stable system of pitch contrasts. Why is pitch a privileged dimension in building musical systems? Why do we not find numerous cultures in which loudness is the basis of musical contrast, and pitch variation is incidental or even suppressed? In such hypothetical (but perfectly possible) cultures, music would have minimal pitch variation but would vary in loudness across several distinct levels, perhaps from note to note, and this variation would be the basis for structural organization and aesthetic response.
也许音高被认为是音乐声音类别的基础的最基本原因是音乐音高感知是多维的。3例如,以八度音程(频率加倍)分隔的音高听上去非常相似,通常被赋予相同的名称,称为音高的音级或色度(例如,钢琴键盘上所有称为“C”的音符)。这种八度等值是音乐中几乎普遍存在的少数几个方面之一:大多数文化都认识到由一个八度音阶分隔的音高的相似性,甚至新手听众也对这种关系表现出敏感性(Dowling & Harwood,1986)。例如,被要求齐唱“同一曲调”的男性和女性通常会在没有意识到的情况下唱出一个八度音程,而小婴儿和猴子将曲调的八度音调转换视为比其他转换更相似(Demany & Armand,1984;Wright 等等人,2000 年)。音乐的这一方面很可能反映了听觉系统的神经生理学。4个
Perhaps the most basic reason that pitch is favored as the basis for musical sound categories is that musical pitch perception is multidimensional.3 For example, pitches separated by an octave (a doubling in frequency) are heard as very similar and are typically given the same name, referred to as the pitches’ pitch class or chroma (e.g., all the notes called “C” on a piano keyboard). Such octave equivalence is one of the few aspects of music that is virtually universal: Most cultures recognize the similarity of musical pitches separated by an octave, and even novice listeners show sensitivity to this relationship (Dowling & Harwood, 1986). For example, men and women asked to sing the “same tune” in unison often sing an octave interval without realizing it, and young infants and monkeys treat octave transpositions of tunes as more similar than other transpositions (Demany & Armand, 1984; Wright et al., 2000). It is likely that this aspect of music reflects the neurophysiology of the auditory system.4
因此,音高的感知相似性不仅取决于音高高度方面的接近度,还取决于音高色度方面的同一性。在单个几何图中表示这两种相似性的一种方法是通过螺旋线,其中音高在垂直方向增加,而音高色度以圆形方式变化 (Shepard, 1982)。在这样的图表中,由八度音阶分隔的音高彼此靠近(图 2.1)。
Thus the perceived similarity of pitches is governed not only by proximity in terms of pitch height but also by identity in terms of pitch chroma. One way to represent these two types of similarity in a single geometric diagram is via a helix in which pitch height increases in the vertical direction while pitch chroma changes in a circular fashion (Shepard, 1982). In such a diagram, pitches that are separated by an octave are near each other (Figure 2.1).
与这种音高的二维感知拓扑相比,没有证据表明第二感知维度将不同响度的声音联系起来,这表明音高比响度更受青睐作为构建音乐声音的基础的一个原因。此外,可以同时组合各个音高以创建具有独特感知品质的新声音实体(例如音程和和弦)。例如,与五个或七个半音(分别是音乐四度和五度)的更平滑音程相比,六个半音(三全音)的音高音程具有明显粗糙的感知质量。相比之下,没有证据表明将不同响度的声音组合会导致感知上多样化的新声音调色板。
In contrast to this two-dimensional perceptual topology for pitch, there is no evidence for a second perceptual dimension linking sounds of different loudness, suggesting one reason why pitch is favored over loudness as a basis for structuring musical sound. Furthermore, individual pitches can be combined simultaneously to create new sonic entities (such as intervals and chords) that have distinctive perceptual qualities. For example, a pitch interval of six semitones (a tritone) has a distinctively rough perceptual quality compared to the smoother sound of intervals of five or seven semitones (a musical fourth and fifth, respectively). In contrast, there is no evidence that combining sounds of differing degrees of loudness leads to a perceptually diverse palette of new sounds. This difference between pitch and loudness is likely due to the nature of the auditory system, which is skilled at separating sound sources based on their pitch but not on their amplitude, thus providing a basis for greater sensitivity to pitch relations than to loudness relations.
图 2.1螺距螺旋线。音高沿着一条螺旋向上的线排列,表示音高高度增加,并且以圆形方式弯曲,以便具有相同色度的音高垂直对齐,表示八度等价。改编自谢泼德,1982 年。
Figure 2.1 The pitch helix. Pitches are arranged along a line that spirals upward, indicating increasing pitch height, and that curves in a circular way so that pitches with the same chroma are aligned vertically, indicating octave equivalence. Adapted from Shepard, 1982.
世界各地音乐音高系统的一个显着共性是根据音阶组织音高对比,换句话说,八度音阶内的一组不同音高和音程作为音乐模式创作的参考点. 因为音程将是我们检查音乐中声音类别的主要焦点,所以掌握音阶结构很重要。在本节中,我介绍一些音阶结构的基本方面,由于其熟悉度而侧重于西欧音乐。然而,要真正理解量表结构的认知意义,有必要跨文化地检查量表,这是下一节的主题。
One of the striking commonalities of musical pitch systems around the world is the organization of pitch contrasts in terms of a musical scale, in other words, a set of distinct pitches and intervals within the octave that serve as reference points in the creation of musical patterns. Because pitch intervals will be the primary focus of our examination of sound categories in music, it important to have a grasp of musical scale structure. In this section, I introduce some basic aspects of scale structure, focusing on Western European music because of its familiarity. To truly appreciate the cognitive significance of scale structure, however, it is necessary to examine scales cross-culturally, which is the topic of the next section.
在西欧“平均律”音乐(当今大多数西方音乐的基础)中,每个八度音阶被分为 12 个相等大小的音程,因此每个音符的频率比下面的音符高约 6%。5这个比率被称为“半音”。(见本章附录2用于将频率比与半音相关联的方程式。请注意,精确的音程测量通常以“音分”报告,其中 1 个半音 = 100 音分。)八度音阶的 12 个半音是西方音乐的“音调材料”(Dowling,1978):它们提供了原材料,从中可以构建了不同的尺度。音阶由八度音程内特定的音程选择组成:通常这种选择在每个八度音程中循环重复。例如,一组上升音程(以半音为单位)[2 2 1 2 2 2 1] 定义了西方音乐中的“大调音阶”,即每八度音阶有七个音高的音阶(因此称为“八度音阶”,这是第八音阶注意比例)。例如,从音符 C 开始产生 C 大调音阶,由音符 [CDEFGABC'] 组成,其中 C' 指定 C 比原始音高一个八度。
In Western European “equal-tempered” music (the basis of most of Western music today), each octave is divided into 12 equal-sized intervals such that each note is approximately 6% higher in frequency than the note below.5 This ratio is referred to as a “semitone.” (See this chapter’s appendix 2 for equations relating frequency ratios to semitones. Note that precise interval measurements are often reported in “cents,” in which 1 semitone = 100 cents.) The 12 semitones of the octave are the “tonal material” of Western music (Dowling, 1978): They provide the raw materials from which different scales are constructed. A musical scale consists of a particular choice of intervals within the octave: Typically this choice repeats cyclically in each octave. For example, the set of ascending intervals (in semitones) [2 2 1 2 2 2 1] defines a “major scale” in Western music, a scale of seven pitches per octave (hence the term “octave,” which is the eighth note of the scale). For example, starting on note C results in a C major scale, consisting of the notes [C D E F G A B C′], in which C′ designates the C an octave above the original.
大调音阶的一个显着特征是它包含几个音程,其频率比接近小的整数比。例如,音阶的五音、四音、三音与第一音的音程分别为7、5、4个半音。通过将半音转换为频率比,可以计算出这些音程的频率比分别为 1.498、1.335 和 1.260,这些值非常接近 3/2、4/3 和 5/4。这并非巧合:西方音乐理论长期以来重视具有简单、小数频率比的音程,而这些音程在西方音乐结构中起着重要作用。6个西方对音乐中这种比例的迷恋可以追溯到毕达哥拉斯,他注意到当同时拨动具有这些比例长度的琴弦时,产生的声音是和谐的。毕达哥拉斯对此没有真正的解释,只是诉诸于数字在支配宇宙秩序方面的神秘力量(Walker,1990,第 3 章)。
One notable feature of the major scale is that it contains several intervals whose frequency ratios approximate small, whole-number ratios. For example, the intervals between the fifth, fourth, and third note of the scale and the first note are 7, 5, and 4 semitones. By converting semitones to frequency ratios, one can compute the frequency ratios of these intervals as 1.498, 1.335, and 1.260, respectively, values that are quite close to the ratios 3/2, 4/3, and 5/4. This is no coincidence: Western music theory has long valued intervals with simple, small-number frequency ratios, and these intervals play an important role in Western musical structure.6 The Western fascination with such ratios in music dates back to Pythagoras, who noted that when strings with these ratio lengths were plucked simultaneously, the resulting sound was harmonious. Pythagoras had no real explanation for this, other than to appeal to the mystical power of numbers in governing the order of the universe (Walker, 1990, Ch. 3).
随着科学的进步,基于对听觉系统结构和功能的了解,关于某些音程在感知中的特殊地位的理论越来越可信。我们不会在这里深入研究这些理论(有兴趣的读者可以查阅本章的附录 3)。相关的一点是,展示音阶结构的自然基础的愿望一直是西方音乐研究的克制。有时这种愿望会产生有趣的后果,例如 Athanasius Kircher 在有影响力的Musurgia Universalis中1650 年的一份报告称,塑造欧洲音乐的自然法则甚至适用于美洲的异国情调丛林,那里的树懒以大音阶歌唱 (Clark & Rehding, 2001)。然而,这种观点也有一个更严重的后果,即把音阶结构的科学研究集中在音程的听觉特性上,而不是作为头脑中学习的声音类别的音高音程的获取和维持基础的认知过程. 也就是说,西方的观点倾向于将音阶强调为自然系统而不是语音系统。为了理解后一种观点,有必要超越西方文化。
As science has progressed, there have been increasingly plausible theories for the special status of certain musical intervals in perception, based on what is known about the structure and function of the auditory system. We will not delve into these theories here (interested readers may consult this chapter’s appendix 3). The relevant point is that the desire to demonstrate a natural basis for scale structure has been a refrain in the study of Western music. Sometimes this desire has amusing consequences, as when Athanasius Kircher, in the influential Musurgia Universalis of 1650, reported that the natural laws that shaped European music applied even in the exotic jungles of the Americas, where sloths sang in the major scale (Clark & Rehding, 2001). There has also been a more serious consequence of this view, however, namely to focus scientific studies of scale structure on the auditory properties of musical intervals rather than the cognitive processes that underlie the acquisition and maintenance of pitch intervals as learned sound categories in the mind. That is, the Western perspective has tended to emphasize scales as natural systems rather than as phonological systems. To gain an appreciation of the latter view, it is necessary to look beyond Western culture.
如果听觉系统的结构对音阶有很强的限制,那么人们会期望音阶显示出很少的文化差异。确实,某些间隔出现在许多文化的尺度中。例如,五音(这是西方音乐中仅次于八度音阶的最重要的音程)在范围广泛的其他音乐传统中也很重要,从印度和中国的大型音乐文化到岛上小部落的音乐大洋洲(Jairazbhoy,1995 年;Koon,1979 年;Zemp,1981 年)。许多实证研究表明,由于听觉系统的性质,这个区间确实具有特殊的感知状态(参见本章的附录 3)。
If the structure of the auditory system provided a strong constraint on musical scales, one would expect scales to show little cultural variability. It is true that certain intervals appear in the scales of many cultures. For example, the fifth (which is the most important interval in Western music after the octave) is also important in a wide range of other musical traditions, ranging from large musical cultures in India and China to the music of small tribes in the islands of Oceania (Jairazbhoy, 1995; Koon, 1979, Zemp, 1981). A number of lines of empirical research suggest that this interval does have a special perceptual status, due to the nature of the auditory system (cf. this chapter’s appendix 3).
然而,从跨文化的角度来看,听觉系统对尺度结构提供了一组相当弱的约束。19 世纪后期,亚历山大·埃利斯 (Alexander Ellis) 对从全球各地运抵欧洲的乐器进行了实验,发现音阶存在很大差异(埃利斯,1885 年)。音乐心理学的奠基人之一赫尔曼·冯·亥姆霍兹很清楚这种变异。在关于音调的感觉(1877:236),他写道,“就像哥特式彩绘拱门一样小,我们的全音阶大音阶是否应该被视为自然产物。” 他继续指出,尽管几乎所有音乐都在构建旋律时使用固定的音高和音程,“在选择特定的音高时,民族品味的偏差会立即显现出来。不同国家使用的天平数量绝不小”(第 253 页)。埃利斯及其继任者在民族音乐学领域的研究表明,尽管某些音程在音阶,例如八度音阶和五度音阶,西方音乐(或任何音乐)中使用的音阶肯定不是反映自然要求的情况。
A cross-cultural perspective reveals, however, that the auditory system provides a rather weak set of constraints on scale structure. In the late 19th century, Alexander Ellis conducted experiments on instruments that had arrived in Europe from all over the globe, and found a great diversity in musical scales (Ellis, 1885). Hermann von Helmholtz, one of the founders of music psychology, was well aware of such variation. In On the Sensations of Tone (1877:236), he wrote, “Just as little as the gothic painted arch, should our diatonic major scale be regarded as a natural product.” He went on to note that although virtually all music used fixed pitches and intervals in building melodies, “In selecting the particular degrees of pitch, deviations of national taste become immediately apparent. The number of scales used by different nations is by no means small” (p. 253). The research of Ellis and his successors in the field of ethnomusicology has shown that although certain intervals are widespread in musical scales, such as the octave and fifth, it is certainly not the case that the scales used in Western music (or any music, for that matter) reflect a mandate of nature.
为了理解这种多样性的认知意义,我们需要深入研究音阶之间的一些关键差异和相似之处。在这样做之前,值得注意的是音阶的概念(作为一组跨越八度音阶的有序音程,从中产生旋律)并不存在于所有音乐文化中,即使音乐可能使用一组有组织的音程. 例如,Will 和 Ellis (1994) 从一首西澳大利亚土著歌曲中拼接出最常被听到的频率,并将它们组织成一个音阶,然后将它们播放给原唱者。“他评论说他对此一无所知。......他通过一场新的表演指出,只有当歌曲包含峰值频率之间的联系(滑音)以及表征......旋律的装饰音时,歌曲才能被很好地演唱”(第 12 页)。因此,一种文化中的歌手可以根据稳定的音高对比来组织他们的旋律,而无需任何明确的音阶概念:事实上,西方文化中许多喜欢唱歌但从未学过音乐的人可能就是这种情况。那么,当我谈到音阶时,我不是在谈论一种文化中的理论结构,而是基于对歌曲或乐器声音的经验测量的音程模式(参见 Meyer,1956:216-217) . 因此,一种文化中的歌手可以根据稳定的音高对比来组织他们的旋律,而无需任何明确的音阶概念:事实上,西方文化中许多喜欢唱歌但从未学过音乐的人可能就是这种情况。那么,当我谈到音阶时,我不是在谈论一种文化中的理论结构,而是基于对歌曲或乐器声音的经验测量的音程模式(参见 Meyer,1956:216-217) . 因此,一种文化中的歌手可以根据稳定的音高对比来组织他们的旋律,而无需任何明确的音阶概念:事实上,西方文化中许多喜欢唱歌但从未学过音乐的人可能就是这种情况。那么,当我谈到音阶时,我不是在谈论一种文化中的理论结构,而是基于对歌曲或乐器声音的经验测量的音程模式(参见 Meyer,1956:216-217) .
To appreciate the cognitive significance of this diversity, we will need to delve into some of the key differences and similarities among musical scales. Before doing this, it is worth noting that the concept of a scale (as a set of ordered intervals spanning the octave, from which melodies are made) does not exist in all musical cultures, even though the music may use an organized set of intervals. For example, Will and Ellis (1994) spliced the most commonly sounded frequencies out of a Western Australian aboriginal song and organized them as a scale, and then played them back to the original singer. “He commented that he could make nothing of this.… He pointed out through a new performance that a song was only well sung when it contained the connections (glides) between the peak frequencies as well as the ornamentations that characterized… the melody” (p. 12). Thus singers in a culture can organize their melodies in terms of stable pitch contrasts without having any explicit concept of a musical scale: Indeed, this is likely to be the case for many people in Western culture who enjoy singing but who have never studied music. When I speak of scales, then, I am speaking not of theoretical constructs within a culture, but of the patterning of intervals based on empirical measurements made from songs or from the sounds of musical instruments (cf. Meyer, 1956:216-217).
秤系统之间的差异已证明广泛使用的秤系统至少在四个方面存在差异。首先,它们在每个八度音程中可供选择音高的“音调材料”的数量不同(Dowling,1978)。在西方音乐中,每个八度音程有 12 个可用音高,通常选择其中 7 个来制作音阶,例如全音阶大调音阶。(“世界的欢乐”的前 7 个音符按降序包含该音阶的音高。)相比之下,印度古典音乐的音阶通常从每个八度音阶的 22 个可能的音高中选择 7 个音高,间隔大约 1/2半音。7因此,在印度音乐中,两个音阶只能在“微音”方面有所不同,如图 2.2a 所示(参见 Ayari & McAdams,2003,讨论阿拉伯音乐中的微调)。
DIFFERENCES AMONG SCALE SYSTEMS Widely used scale systems have been shown to differ in at least four ways. First, they differ in the amount of “tonal material” within each octave available for choosing pitches (Dowling, 1978). In Western music, there are 12 available pitches per octave, out of which 7 are typically chosen to make a musical scale such as the diatonic major scale. (The first 7 notes of “Joy to the World” contain the pitches of this scale in descending order.) In contrast, scales in Indian classical music typically choose 7 pitches from among 22 possible pitches in each octave, separated by approximately 1/2 semitone.7 Thus in Indian music, two scales can differ only in terms of “microtones,” as in Figure 2.2a (cf. Ayari & McAdams, 2003, for a discussion of microtones in Arabic music).
其次,音阶在每八度音程中选择的音高数量不同,范围从某些美洲原住民音乐系统中的 2 个到各种文化中每八度音程的 7 个音调,包括欧洲、印度和非洲的文化(Clough 等人,1993 年; Nettl, 1954; Nketia, 1974),每个八度音阶最常见的音调数量是 5 (Van Khê, 1977)。
Second, scales differ in the number of pitches chosen per octave, ranging from 2 in some Native American musical systems to 7 tones per octave in a variety of cultures, including cultures in Europe, India, and Africa (Clough et al., 1993;Nettl, 1954; Nketia, 1974), with the most common number of tones per octave being 5 (Van Khê, 1977).
图 2.2a两个北印度拉格的升阶。拉格是一种音乐形式,包括特定的音阶、特定的旋律动作和其他特征 (Bor, 1999)。在 Rag Todi 中,三个音调(d ♭、e ♭和 a ♭)比它们在 Rag Multani 中的对应部分(用箭头表示)稍微更平(大约 1/4 半音)。由 Arun Dravid 博士和 Michael Zarky 提供。
Figure 2.2a The ascending scales of two North Indian ragas. A raga is a musical form that includes a specific scale, particular melodic movements, and other features (Bor, 1999). In Rag Todi, three tones (d♭, e♭, and a♭) are slightly flatter (about 1/4 semitone) than their counterparts in Rag Multani (indicated by arrows). Courtesy of Dr. Arun Dravid and Michael Zarky.
第三,音阶的音程模式不同。因此,两个音阶可能都使用每八度音程 7 个音高,但以非常不同的方式安排这些音高的间距。这在图 2.2b中进行了说明,该图显示了西方大调音阶的音程与甘美兰音乐中使用的 7 音爪哇 pelog 音阶音程的比较。(另一个爪哇音阶,每八度音阶有 5 个音调,也被显示出来以供比较。)爪哇音阶以缺乏与小整数频率比相对应的音程而著称(Perlman & Krumhansl,1996)。
Third, scales differ in interval patterns. Thus two scales may both use 7 pitches per octave, but arrange the spacing of these pitches in very different ways. This is illustrated in Figure 2.2b, which shows the intervals of the Western major scale in comparison to pitch intervals of the 7-tone Javanese pelog scale used in Gamelan music. (Another Javanese scale, with 5 tones per octave, is also shown for comparison.) The Javanese scales are notable for their lack of intervals which correspond to small-whole number frequency ratios (Perlman & Krumhansl, 1996).
最后,音阶系统在不同乐器的调音标准化程度方面差异很大。例如,在对使用 slendro 音阶调音的 22 个爪哇加麦兰进行的一项研究中,Perlman 发现没有音程变化小于 30 音分,有些音程变化高达 75 音分(引自 Perlman & Krumhansl,1996;参见 Arom 等人., 1997,关于非洲木琴音程变化的数据)。这种可变性是否意味着音程对听这种音乐的人来说并不重要?不一定:乐器之间缺乏标准化可能仅仅意味着听众制定了相当广泛的音程标准(Arom 等人,1997 年;Cooke,1992 年)。
Finally, scale systems vary dramatically in how standardized the tunings are across different instruments. For example, in a study of 22 Javanese gamelans tuned using the slendro scale, Perlman found that no interval varied less than 30 cents, with some intervals varying as much as 75 cents (cited in Perlman & Krumhansl, 1996; cf. Arom et al., 1997, for data on interval variability in African xylophones). Does such variability imply that intervals are not important to people who listen to such music? Not necessarily: The lack of standardization between instruments may simply mean that listeners develop rather broad interval standards (Arom et al., 1997; Cooke, 1992).
总的来说,这四种变奏反对这样一种观念,即塑造音阶的主要力量是追求具有简单、小数比的音程。在西方文化之外,毕达哥拉斯的梦想是不符合事实的。
Collectively, these four types of variation argue against the notion that the primary force in shaping musical scales is the pursuit of intervals with simple, small-number ratios. Outside of Western culture, the dream of Pythagoras is not in accord with the facts.
图 2.2b西方大调音阶和两个爪哇音阶(pelog 和 slendro)中音调之间的频率间隔。细垂直线标记 20 美分的间隔。西方音阶的音调标有它们的视唱名。按照惯例,slendro 音阶的五个音符记为 1、2、3、5、6。改编自 Sutton,2001 年。
Figure 2.2b Frequency spacing between tones in the Western major scale and in two Javanese scales (pelog and slendro). The thin vertical lines mark 20-cent intervals. The tones of the Western scale are labeled with their solfege names. By convention, the five notes of the slendro scale are notated 1, 2, 3, 5, 6. Adapted from Sutton, 2001.
尺度系统之间的共性 尺度系统的多样性是否意味着应该放弃对尺度系统的普遍影响的搜索?一点也不。事实上,这种多样性的事实使得任何广泛的共性在暗示共同的听觉和/或认知倾向方面变得更加重要,正如之前在八度音阶和音乐五度中观察到的那样。可以注意到三个这样的共同点。
COMMONALITIES AMONG SCALE SYSTEMS Does the diversity of scale systems mean that the search for universal influences on scale systems should be discarded? Not at all. Indeed, the very fact of such diversity makes any widespread commonalities all the more significant in terms of suggesting common auditory and/or cognitive predispositions, as was observed earlier with the octave and musical fifth. Three such commonalities can be noted.
第一个也是最明显的共同点是每个八度音程的音调数量,通常在 5 到 7 之间。即使在八度音程微分的文化中,例如印度,任何给定的音阶通常都包含 5 到 7 个音高。重要的是,人类频率辨别力无法预测这一限制,人类频率辨别力能够在每个八度音程中区分更多的音调。相反,这种限制几乎可以肯定是由于对审美多样性的渴望与人类思维可以沿着单一物理连续体可靠地跟踪的不同类别数量的普遍限制之间的妥协(Clough 等人,1993 年;Miller,1956 年) ).
The first and most obvious commonality concerns the number of tones per octave, which typically ranges between 5 and 7. Even in cultures with microtonal divisions of the octave, such as India, any given scale typically involves between 5 and 7 pitches. Importantly, this limit is not predicted by human frequency discrimination, which is capable of distinguishing many more tones per octave. Instead, the limit is almost certainly due to the compromise between the desire for aesthetic variety and universal constraints on the number of distinct categories the human mind can reliably keep track of along a single physical continuum (Clough et al., 1993; Miller, 1956).
第二个共性涉及尺度区间的大小分布。音阶中相邻音调之间的音程通常在 1 到 3 个半音之间(Nettl,1954;Voisin,1994)。Nettl (2000) 观察到,几乎所有文化都在很大程度上依赖于大约 2 个半音的音程来构建旋律。即使在音阶中只有两个音调(1 个音程)的文化,例如某些美洲原住民部落 (Nettl, 1954:15),也会选择此范围内的音程,尽管原则上他们可以选择一个大音程。避免大音程的原因可能只是符合人体工程学:这样的音阶会与由于音高跳跃而难以演唱的旋律相关联。更有趣的是避免小于半音的音程(“微间隔”),尤其是在它们有机会发生的情况下。例如,在印度古典音乐中,可以构造一个音阶,其中两个相邻音调相隔 1/2 半音,但这种音阶在实践中不会出现。我所知道的这种模式的一个例外来自美拉尼西亚排箫,Zemp (1981) 在其中记录了一组管子中的两个管子,它们仅相差 33 美分。然而,事实证明这是一个证明规则的例外,因为管子被放置在排箫的两端并且从来没有连续演奏过。因此,听觉系统似乎喜欢音阶的相邻音符之间至少有一个半音的间隔。Burns 和 Ward (1978) 的研究表明了这样做的原因,谁检查了音乐新手根据音程大小区分两个音程的能力。在他们的研究中,允许改变音程中低音调的绝对频率。在这种情况下,阻止使用固定低音作为参考的策略并强制进行真正的音程比较,歧视的阈值约为 80 美分。这表明在音乐环境中,如果要可靠地区分大小,间隔应该至少相差一个半音。当音阶中的最小音程约为 1 个半音时,这正是所确保的。这阻止了使用固定低音作为参考的策略并强制进行真正的音程比较,辨别阈值约为 80 音分。这表明在音乐环境中,如果要可靠地区分大小,间隔应该至少相差一个半音。当音阶中的最小音程约为 1 个半音时,这正是所确保的。这阻止了使用固定低音作为参考的策略并强制进行真正的音程比较,辨别阈值约为 80 音分。这表明在音乐环境中,如果要可靠地区分大小,间隔应该至少相差一个半音。当音阶中的最小音程约为 1 个半音时,这正是所确保的。
The second commonality concerns the size distribution of intervals in scales. Intervals between adjacent tones in a scale tend to be between 1 and 3 semitones in size (Nettl, 1954; Voisin, 1994). Nettl (2000) has observed that almost all cultures rely heavily on an interval of about 2 semitones in size in the construction of melodies. Even cultures with only two tones (1 interval) in their scale, such as certain native American tribes (Nettl, 1954:15), choose an interval in this range, though in principle they could opt for a single large interval. The reason large intervals are avoided may simply be ergonomic: Such scales would be associated with melodies that are awkward to sing because of the large pitch leaps. More interesting is the avoidance of intervals less than a semitone (“microintervals”), especially in circumstances in which there is an opportunity for them to occur. For example, in Indian classical music one could construct a scale in which two adjacent tones are separated by 1/2 semitone, but such scales do not occur in practice. The one exception to this pattern of which I am aware comes from Melanesian panpipes, in which Zemp (1981) recorded two tubes from a single set of pipes that differed by only 33 cents. However, this turned out to be an exception that proved the rule, as the tubes were placed on opposite ends of the panpipes and were never played in succession. Thus the auditory system seems to favor intervals of at least a semitone between adjacent notes of a scale. The reason for this is suggested by research by Burns and Ward (1978), who examined the ability of musical novices to discriminate between two intervals on the basis of their size. In their study, the absolute frequency of the lower tone of the interval was allowed to vary. In such a context, which prevents a strategy of using a fixed lower pitch as a reference and forces true interval comparison, the threshold for discrimination was about 80 cents. This suggests that in musical contexts intervals should differ by at least a semitone if they are to be reliably discriminated in size. This is precisely what is ensured when the smallest interval in the scale is about 1 semitone.
第三个共同点,也许从认知的角度来看是最有趣的,涉及音阶内音程大小的模式。音阶具有不同大小的音程(例如,在西方大调音阶的情况下为 1 和 2 个半音)比具有大致相同大小的所有音程更为常见。8个简而言之,不对称尺度比对称尺度更常见。对称音阶的示例包括 Javanese slendro,具有 5 个几乎相等大小的音程,以及西方全音音阶,具有 6 个相等大小的音程,由 Claude Debussy 使用。什么可能解释不对称尺度的主导地位?Balzano (1980) 和其他人注意到,不等音程模式有助于使音阶中的每个音调在其音程模式方面与其他音阶音程相比是独一无二的。
The third commonality, and perhaps the most interesting from a cognitive standpoint, concerns the patterning of interval sizes within musical scales. It is far more common for scales to have intervals of different sizes (e.g., 1 and 2 semitones in the case of the Western major scale) than to have all intervals of approximately the same size.8 Put more succinctly, asymmetric scales are far more common than symmetric ones. Examples of symmetric scales include Javanese slendro, with 5 intervals of almost equal size, and in the Western whole-tone scale, with 6 intervals of equal size, which was used by Claude Debussy. What might account for the dominance of asymmetric scales? Balzano (1980) and others have noted that a pattern of unequal intervals serves to make each tone in the scale unique in terms of its pattern of intervals with respect to other scale tones.
这种独特性可以帮助听众保持一种感觉,即他们相对于音阶第一音的位置,这种特性将有助于第一音充当感知锚点或音调中心。在对称音阶中,仅根据音高音程可能更难感知一个人相对于音调中心的位置,因此音调方向必须由其他线索提供,例如明确定义的旋律和节奏循环加麦兰音乐。关于这个问题的一个有趣的证据是对不对称与对称比例的感知的研究婴儿和成人的结构。Trehub 等人。(1999) 基于每八度音程 7 个音程构建了不熟悉的不对称和对称音阶,并测试了婴儿和成人检测这些音阶重复中细微音高变化的能力。婴儿更善于检测不对称尺度的这种变化,即使这两个尺度都不熟悉,这表明不对称性赋予了一些处理优势。然而,有趣的是,成年人在两种不熟悉的音阶上表现不佳,只有在使用熟悉的大调音阶时表现良好。换句话说,不对称尺度的任何固有处理优势都被文化熟悉的影响所淹没。尽管如此,Trehub 等人的结果可能与不同文化中音阶结构的起源有关,9
This property of uniqueness may help listeners maintain a sense of where they are in relation to the first tone of the scale, a property that would help the first tone to act as a perceptual anchor point or tonal center. In symmetric scales, it may be more difficult to sense one’s position with respect to the tonal center on the basis of pitch intervals alone, so that tonal orientation would have to be provided by other cues, such as the clearly defined melodic and rhythmic cycles of Gamelan music. One interesting line of evidence with respect to this issue is a study of the perception of asymmetric versus symmetric scale structure by infants and adults. Trehub et al. (1999) constructed unfamiliar asymmetric and symmetric scales based on 7 intervals per octave, and tested the ability of infants and adults to detect subtle pitch changes in repetitions of these scales. Infants were better at detecting such changes in the asymmetric scales, even though both scales were unfamiliar, suggesting that the asymmetry conferred some processing advantage. Interestingly, however, adults showed poor performance on both types of unfamiliar scales, and performed well only when a familiar, major scale was used. In other words, any inherent processing advantage of the asymmetric scales was overwhelmed by the effect of cultural familiarity. Nevertheless, Trehub et al.’s results may be relevant to the origin of musical scale structure in diverse cultures, because in contexts in which there is no “familiar” scale, the cognitive system may be biased toward choosing an asymmetric scale structure over a symmetric one because of a predisposition for music with a clear tonal center.9
文化多样性和共性:结论尺度系统之间的异同有何认知意义?首先处理相似性,一些共性(每个八度音阶的音调数量及其间距)似乎可以用塑造交流系统的基本力量来解释。这些共性的出现是因为在给定的物理连续体中可以识别的声音类别的数量是有限的,并且因为人体工程学和声音系统中的感知可辨性之间的固有权衡。从认知的角度来看,更有趣的是不对称尺度的优势。这表明大多数文化都喜欢能够促进相对于音调中心的方向感的音阶模式,并提出了更深层次的问题,即为什么会这样。
CULTURAL DIVERSITY AND COMMONALITY: CONCLUSIONS What is the cognitive significance of the similarities and differences among scale systems? Treating similarities first, it appears that some commonalities (number of tones per octave and their spacing) can be explained by basic forces that shape communicative systems. These commonalities emerge because there are limits on the number of sound categories that can be discerned along a given physical continuum, and because of inherent tradeoffs between ergonomics and perceptual discriminability in sound systems. More interesting from a cognitive standpoint is the predominance of asymmetric scales. This suggests that most cultures favor scale patterns that promote a sense of orientation with respect to a tonal center, and raises the deeper question of why this should be. One possibility is that having a tonal center provides a cognitive reference point for pitch perception, which in turn makes it easier to learn and remember complex melodic sequences (Rosch, 1973, 1975; cf. Krumhansl, 1990).
谈到音阶之间的差异,音阶结构的巨大可变性表明寻找音乐音高间隔的“自然基础”是一种内在有限的努力。如果人们将重点从作为自然系统的音阶转移到作为语音系统的音阶,换句话说,转移到对如何在有组织的音高对比系统的背景下创建和维护音高类别的兴趣,就会出现一个更广泛的研究领域。据此,某些音程的自然基础(如五度;参见本章附录 3)只是说明听觉倾向和类别形成之间存在联系。这种联系并非音乐所独有:语音也利用听觉系统的固有方面来形成有组织的声音对比。例如,浊音和清音辅音(例如 /b/ 和 /p/)之间的差异似乎利用了独立于文化的自然听觉边界(Pisoni,1977 年;Holt 等人,2004 年,Steinschneider 等人, 2005)。因此,在音乐和语音中,听觉系统的固有特性很可能会影响某些类别和对比的“可学习性”。然而,就理解大脑如何产生和维持声音类别这一更广泛的问题而言,这只是拼图的一部分。
Turning to the differences among scales, the great variability in scale structure suggests that the search for a “natural basis” for musical pitch intervals is an inherently limited endeavor. A broader field of enquiry emerges if one shifts the emphasis from scales as natural systems to scales as phonological systems, in other words, to an interest in how pitch categories are created and maintained in the context of an organized system of pitch contrasts. In this light, the natural basis of certain intervals (such as the fifth; cf. this chapter’s appendix 3) simply illustrates that there is a link between auditory predispositions and category formation. Such a link is not unique to music: Speech also exploits inherent aspects of the auditory system in forming organized sound contrasts. For example, the difference between voiced and voiceless consonants (such as /b/ and /p/) appears to exploit a natural auditory boundary that is independent of culture (Pisoni, 1977; Holt et al., 2004, Steinschneider et al., 2005). Thus it is likely that in both music and speech the inherent properties of the auditory system influence the “learnability” of certain categories and contrasts. In terms of understanding the broader question of how the mind makes and maintains sound categories, however, this is simply one piece of the puzzle.
在描述了音阶和音程的一些基本方面之后,现在是时候从认知的角度来检查音高音程作为学习的声音类别,而不是作为音调之间的固定频率比。这样做的动机是,在现实世界中,音程很少以精确的、柏拉图式的方式实现。由于偶然变化和当地旋律背景的影响,“相同”音高间隔的大小可能会有所不同(Jairazbhoy & Stone,1963;Levy,1982;Morton,1974;Rakowski,1990)。这就是为什么听众需要一种机制来帮助他们识别不同标记的结构等价性,并将声学可变性映射到稳定的心理类别。
Having described some fundamental aspects of musical scales and intervals, it is now time to take a cognitive perspective and examine pitch intervals as learned sound categories rather than as fixed frequency ratios between tones. The motivation for this is that in the real world, intervals are seldom realized in a precise, Platonic fashion. The “same” pitch interval can vary in size both due to chance variation and to the influence of local melodic context (Jairazbhoy & Stone, 1963; Levy, 1982; Morton, 1974; Rakowski, 1990). This is why listeners require mechanisms that help them recognize the structural equivalence of different tokens and map acoustic variability on to stable mental categories. The four subsections below provide evidence for intervals as learned sound categories by demonstrating that the processing of pitch relations is influenced by the structure of a culture’s interval system.
Shepard 和 Jordan (1984) 提供了早期和令人信服的证据,证明音高间隔是学习的声音类别。他们让参与者听一个升序的音调序列,将八度音阶分成 7 个大小相等的音程(这意味着音阶中连续音高的比例为 2 1/7或大约 1.1,因此每个音高大约高 10%频率低于它下面的频率)。要求听众判断每个音程相对于前一个音程的大小,并指出当前音程是更大、更小还是大小相同。听众的音乐训练程度差异很大,并且刚刚在大学课堂上学习了对数频率的物理测量。他们被明确要求根据间隔的物理大小做出判断而不是他们的音乐关系(他们没有被告知音阶具有相同大小的步长)。有趣的结果是参与者认为第 3 和第 7 个间隔比其他间隔大。这些恰好是西方大调音阶中较小的音程(1 个半音),表明听众已将新音阶同化为一组内部声音类别。特别值得注意的是这些习得的内部音程标准会自动应用,即使指令不鼓励听音乐。
Shepard and Jordan (1984) provided early and compelling evidence for pitch intervals as learned sound categories. They had participants listen to an ascending sequence of tones that divided the octave into 7 equal-sized intervals (meaning that the ratio of successive pitches in the scale was 21/7 or about 1.1, so that each pitch was about 10% higher in frequency than the one below it). Listeners were asked to judge the size of each interval in relation to the preceding one, and indicate whether the current interval was larger, smaller, or the same size. The listeners varied widely in their degree of musical training, and had just learned about the physical measure of log frequency in a college class. They were explicitly asked to base their judgments on the physical sizes of intervals and not their musical relations (they were not told that the scale had equal-sized steps). The interesting result was that participants judged the 3rd and 7th intervals to be larger than the others. These are precisely the intervals that are small (1 semitone) in a Western major scale, suggesting that listeners had assimilated the novel scale to a set of internal sound categories. It is especially notable that these learned internal interval standards were applied automatically, even when instructions did not encourage musical listening.
尽管对非常熟悉的旋律的记忆包括一些关于绝对音高的信息 (Levitin, 1994; Schellenberg & Trehub, 2003),旋律识别显然涉及音高关系的感知,因为听众可以在不同的音域中识别相同的旋律(例如,在大号上演奏)或短笛)或由同一乐器以不同键演奏时。旋律识别中涉及的一个因素是旋律轮廓:不考虑精确音程大小的起伏模式。例如,图 2.3a中旋律的轮廓可以表示为 +,+,-,-,-,+。
Although memory for highly familiar melodies includes some information about absolute pitch (Levitin, 1994; Schellenberg & Trehub, 2003), melody recognition clearly involves the perception of pitch relations because listeners can recognize the same melody in different registers (e.g., played on a tuba or a piccolo) or when played by the same instrument in different keys. One factor involved in melody recognition is the melodic contour: the patterns of ups and downs without regard to precise interval size. For example, the contour of the melody in Figure 2.3a can be represented as +,+,-,-,-,+.
另一个因素是精确间隔的顺序。多年来,Dowling 和其他人在研究中探讨了这两个因素的相对重要性(例如,Dowling,1978;Edworthy,1985a,b;Dowling 等,1995)。在一个典型的实验中,听者听到一段新的旋律(例如,图 2.3a),然后听到目标旋律(图 2.3b - d)。目标旋律可以是保留音调之间精确音程模式的移调(精确移调,相当于调的变化,图 2.3b)。在其他情况下,可以从精确的换位中稍微改变目标,使其具有相同的轮廓但具有不同的间隔模式(“相同轮廓的诱饵”,如图 2.3c 所示)). 最后,目标可以具有与原始轮廓完全不同的轮廓(图 2.3d中的“不同轮廓诱饵” ;参见声音示例 2.1)。
Another factor is the sequence of precise intervals. The relative importance of these two factors has been explored in studies by Dowling and others over many years (e.g., Dowling, 1978; Edworthy, 1985a, b; Dowling et al., 1995). In a typical experiment, a listener hears a novel melody (e.g., Figure 2.3a), and then later hears a target melody (Figure 2.3b-d). The target melody can be a transposition that preserves the precise pattern of intervals between tones (an exact transposition, which amounts to a change of key, Figure 2.3b). In other cases, the target can be slightly altered from an exact transposition so that it has the same contour but a different pattern of intervals (a “same contour lure,” as in Figure 2.3c). Finally, the target can have an entirely different contour from the original (a “different contour lure,” in Figure 2.3d; cf. Sound Example 2.1).
图 2.3音调序列 B、C 和 D 与音调序列 A 有不同的关系。(B = 精确换位,C = 相同轮廓但不同音程,D = 不同轮廓。)根据 Dowling、Kwak 和 Andrews,1995。声音示例 2.1ad。
Figure 2.3 Tone sequences B, C, and D have different relations to tone sequence A. (B = exact transposition, C = same contour but different intervals, D = different contour.) After Dowling, Kwak, & Andrews, 1995. Cf. sound example 2.1a-d.
以熟悉的旋律(例如“一闪一闪,小星星”)为例,向听众说明精确换位和两种不同类型的诱饵之间的区别,并告诉他们在实验过程中,他们应该只回答“是”目标是第一旋律的精确移调。
Listeners are instructed about the difference between exact transpositions and the two different kind of lures using a familiar melody as an example (such as “Twinkle, Twinkle, Little Star”), and told that during the experiment they should respond “yes” only to targets that are an exact transposition of the first melody.
当两个旋律之间的延迟很短时,听众通常会将精确的换位与相同的轮廓诱饵混淆,这表明他们的反应受轮廓相似性的支配(Dowling,1978)。然而,如果使用更长的延迟(可以用其他旋律或一些分散注意力的任务填充),这种辨别能力会提高,表明轮廓的重要性下降,有利于精确的音高间隔,就好像音高关系的心理表征根据一系列间隔类别进行合并(Dowling 和 Bartlett,1981 年;Dewitt 和 Crowder,1986 年)。
When the delay between the two melodies is brief, listeners often confuse exact transpositions with same contour lures, indicating that their response is dominated by contour similarity (Dowling, 1978). If, however, a longer delay is used (which can be filled with other melodies or with some distracting task), this discrimination ability improves, indicating that contour declines in importance in favor of exact pitch intervals, as if the mental representation of pitch relations was being consolidated in terms of a sequence of interval categories (Dowling & Bartlett, 1981; Dewitt & Crowder, 1986).
上面概述的两种方法提供了间接证据,表明音程在感知中充当习得的声音类别,并且值得注意的是在未因其音乐专业知识而被选中的个人中发现效果。我们现在转向分类感知 (CP) 的研究,这是一种在言语中广泛探索的特定类型的感知。
The two approaches outlined above provide indirect evidence that intervals act as learned sound categories in perception, and are notable for finding effects in individuals who were not selected for their musical expertise. We now turn to studies of categorical perception (CP), a particular type of perception that has been widely explored in speech.
CP指的是两个相关的现象。首先,沿着物理连续体的声音被认为属于不同的类别,而不是逐渐从一个类别变为另一个类别。其次,具有给定物理差异程度的声音如果跨越类别边界,则比属于同一类别更容易区分。因此,使用识别和辨别任务来研究 CP。CP 的有力证据包括在类别之间具有陡峭过渡的识别函数,以及在类别边界附近具有明显最大值的判别函数(图 2.4)。
CP refers to two related phenomena. First, sounds that lie along a physical continuum are perceived as belonging to distinct categories, rather than gradually changing from one category to another. Second, sounds of a given degree of physical difference are much easier to discriminate if they straddle a category boundary than if they fall within the same category. CP is thus studied using both identification and discrimination tasks. Strong evidence for CP consists of identification functions with steep transitions between categories combined with discrimination functions that have pronounced maxima near category boundaries (Figure 2.4).
早期对婴儿某些语音对比(例如 /p/ 与 /b/)的 CP 演示被视为语音依赖于特殊神经机制将连续声音转换为离散心理类别的证据(Eimas 等人,1971)。随后的研究表明,可以在动物中观察到语音对比的 CP(Kuhl 和 Miller,1975),表明 CP 不是为人类语音进化的特殊机制。相反,语音感知在某些情况下可能会利用自然听觉边界(Handel,1989 年;Holt 等人,2004 年)。此外,许多重要的语音,如元音和声调语言的词汇声调,很少或没有显示 CP 的证据,尽管它们在感知中充当稳定的声音类别(Fry 等人,1962 年;Stevens 等人,1962 年)。 ,1969 年;雷普,1984 年;雷普和威廉姆斯,1987 年;弗朗西斯等人,2003)。这是一个值得重复的重要观点:元音和词汇声调的感知表明头脑可以映射没有 CP 的稳定内部声音类别的声学变化。(我将在本章后面,比较语音中的元音感知与音乐中的和弦感知时回到这一点。)目前相关的要点是:虽然 CP 的证据证明声音是根据离散类别来感知的,但缺乏CP 的证据并不能证明相反的情况。也就是说,声音可以在不被明确感知的情况下被“明确解释”(参见 Ladd & Morton,1997)。因此,缺乏音程 CP 的证据并不能真正帮助我们确定音高音程是否作为学习的声音类别。
Early demonstrations of CP for certain speech contrasts in infants (such as /p/ vs. /b/) were taken as evidence that speech relied on special neural mechanisms to transform continuous sound into discrete mental categories (Eimas et al., 1971). Subsequent research showed that CP for speech contrasts could be observed in animals (Kuhl & Miller, 1975), indicating that CP is not a special mechanism evolved for human speech. Instead, speech perception may in some cases take advantage of natural auditory boundaries (Handel, 1989; Holt et al., 2004). Furthermore, many important speech sounds, such as vowels and the lexical tones of tone languages, show little or no evidence of CP, despite the fact that they act as stable sound categories in perception (Fry et al., 1962; Stevens et al., 1969; Repp, 1984; Repp & Williams, 1987; Francis et al., 2003). This is an important point that bears repetition: The perception of vowels and lexical tones shows that the mind can map acoustic variation onto stable internal sound categories without CP. (I will return to this point later in this chapter, when comparing vowel perception in speech to chord perception in music.) For the moment the relevant point is this: Although evidence of CP proves that sounds are perceived in terms of discrete categories, lack of evidence of CP does not prove the converse. That is, sounds can be “categorically interpreted” without being categorically perceived (cf. Ladd & Morton, 1997). Thus lack of evidence of CP for musical intervals cannot really help us decide if pitch intervals act as learned sound categories or not.
然而,音乐中的 CP 研究值得讨论,因为它们突出了贯穿音乐认知文献的一个问题,即音乐家和非音乐家之间的认知差异。Burns 和 Ward (1978) 发现了音乐家音程 CP 的证据,并表明音乐家的音程大小 CP 与语音中的语音 CP 一样尖锐(cf. Zatorre & Halpern, 1979; Howard et al., 1992)。然而,非音乐家没有显示出令人信服的 CP 证据。10史密斯等人。(1994) 对非音乐家进行了更广泛的测试,并试图消除一些可能掩盖他们在 Burns 和 Ward 研究中表现的任务相关因素。例如,向听众详细介绍了音程的概念,包括许多音乐示例。一个子组接受标准类别标签(例如,“大三度”)的训练,而另一个子组则被给予与熟悉的曲调相对应的标签,以鼓励他们在音程识别任务中使用对这些曲调的记忆(例如,标签“完美” fourth”被标签“Here comes the bride”所取代,因为这个熟悉的旋律就是从这个音程开始的)。史密斯等人。发现使用熟悉的曲调确实使听众的识别功能更陡峭和更明确,但两组听众在辨别任务上的表现都很差,类别边界对辨别力的影响很小。因此,非音乐家没有表现出对音程进行分类感知的有力证据。
Nevertheless, studies of CP in music are worth discussing because they highlight an issue that runs through the music cognition literature, namely cognitive differences between musicians and nonmusicians. Burns and Ward (1978) found evidence for CP of intervals by musicians, and suggested that musicians’ CP for interval size was as sharp as phonetic CP in speech (cf. Zatorre & Halpern, 1979; Howard et al., 1992). Nonmusicians, however, showed no convincing evidence of CP.10 Smith et al. (1994) conducted more extensive tests of nonmusicians, and tried to eliminate some of the task-related factors that might have obscured their performance in the Burns and Ward study. For example, listeners were given a thorough introduction to the concept of an interval, including many musical examples. One subgroup was trained on standard category labels (e.g., “major third”), whereas the other was given labels corresponding to familiar tunes, to encourage them to use their memory of these tunes in the interval identification task (e.g., the label “perfect fourth” was replaced with the label “Here comes the bride” because this familiar melody begins with this interval). Smith et al. found that the use of familiar tunes did make listener’s identification functions steeper and more categorical, but listeners in both groups performed poorly on the discrimination tasks, with the category boundary having only a weak effect on discrimination. Thus nonmusicians did not show strong evidence of categorical perception of intervals.
图 2.4理想化的分类感知函数。通过大小相等的小声学步长,语音从 /b/ 变为 /d/。在刺激连续体中的特定点,对刺激的感知会从一个类别显着转移到另一个类别,并且在该感知边界处,对沿着连续体一步分开的刺激之间的区分是最好的。由 Louis Goldstein 提供。
Figure 2.4 An idealized categorical perception function. A speech sound is changed from /b/ to /d/ via small acoustic steps of equal in size. Perception of the stimuli shifts dramatically from one category to the other at a particular point in the stimulus continuum, and discrimination between stimuli separated by one step along the continuum is best at this perceptual boundary. Courtesy of Louis Goldstein.
非音乐家的 CP 问题最近很少受到关注,但值得讨论,因为它反映了那些认为音乐家和非音乐家对音乐具有大致相似的心理表征的人之间的更大争论,仅在音乐家具有的某些明确程序的技巧上有所不同受过训练(Bigand, 2003; Bigand & Poulin-Charronnat, 2006)和那些认为音乐家和非音乐家对音乐的心理表征可能完全不同的可能性(例如,Dowling, 1986; Smith, 1997)。就目前的目的而言,这项研究的教训是,人们应该对这样一种观点持开放态度,即学习音程标准的框架可能显示出很大的可变性,不仅在音乐家和非音乐家之间,甚至在训练有素的音乐家之间也是如此(Cooke,1992;Perlman &克鲁姆汉斯尔 1996). 归根结底,健全范畴结构的个体差异问题只能通过实证研究来解决。与此同时,重要的是不要认为个体之间的差异是“坏的”。事实上,从认知神经科学的角度来看,这很好,因为它创造了研究大脑如何随经验变化的机会。
The issue of CP in nonmusicians has received little attention recently, but is worth discussing because it reflects a larger debate between those who see musicians and nonmusicians as having largely similar mental representations for music, differing only in skill at certain explicit procedures for which musicians have been trained (Bigand, 2003; Bigand & Poulin-Charronnat, 2006) and those who entertain the possibility that the mental representations for music in musicians and nonmusicians may be quite different (e.g., Dowling, 1986; Smith, 1997). For the current purposes, the lesson of this research is that one should remain open to the idea that the framework of learned interval standards may show substantial variability, not only between musicians and nonmusicians, but even among trained musicians (Cooke, 1992; Perlman & Krumhansl, 1996). In the end, the question of individual differences in sound category structure can be resolved only by empirical research. In the meantime, it is important not to think of variability among individuals as “bad.” In fact, from the standpoint of cognitive neuroscience, it is good because it creates the opportunity to study how the brain changes with experience.
近年来,由于对称为失配负性或 MMN 的大脑反应的研究,出现了一种研究音乐声音类别的新方法 (Näätänen, 1992)。MMN 是与重复听觉信号中的自动变化检测相关的事件相关电位 (ERP),并且在听觉皮层中有神经发生器。它通常在“古怪”范例中进行研究,其中重复呈现标准声音或声音模式(例如,音调或短音调序列),偶尔以伪随机方式引入偏差。例如,在该范例的简单版本中,标准可以是 400 赫兹的短纯音,出现概率为 85%,其余音调为 420 赫兹的异常音。通常指示听者忽略音调(例如,阅读一本书),同时从头皮连续记录电 (EEG) 或磁 (MEG) 脑电波反应。时间锁定到刺激开始的脑电波信号片段被平均以产生符合标准和异常音调的 ERP。MMN 是 ERP 与标准和偏差之间的差异,
In recent years, a novel approach to the study of musical sound categories has arisen thanks to research on a brain response called the mismatch negativity, or MMN (Näätänen, 1992). The MMN is an event-related potential (ERP) associated with automatic change detection in a repetitive auditory signal, and has neural generators in the auditory cortex. It is typically studied in an “oddball” paradigm in which a standard sound or sound pattern (e.g., a tone or a short sequence of tones) is presented repeatedly, with an occasional deviant introduced in a pseudorandom fashion. For example, in a simple version of this paradigm, the standard could be a short pure tone of 400 Hz that occurs with 85% probability, with the remaining tones being deviants of 420 Hz. A listener is typically instructed to ignore the tones (e.g., to read a book) while electrical (EEG) or magnetic (MEG) brain wave responses are recorded continuously from the scalp. Segments of the brain wave signal time-locked to the stimulus onset are averaged to produce ERPs to the standard and deviant tones. The MMN is the difference between the ERP to the standard and the deviant, and consists of a negative-going wave that peaks between 80 and 200 ms after the onset of the deviant tone (for one review, see Näätänen & Winkler, 1999; for a possible single-neuron correlate of the MMN, see Ulanovsky et al., 2003)
如果 MMN 只对刺激之间的物理差异敏感,那么它对认知科学就没有什么意义了。然而,有证据表明 MMN 对学习的声音类别敏感。例如,Brattico 等人。(2001) 使用音高上升的 5 个短音序列作为标准刺激。异常刺激与第三个音调的音高高于所有其他音调的顺序相同,从而在旋律轮廓中产生显着变化。在一种情况下,标准序列符合大调音阶的前 5 个音,连续音之间的半音间隔为 2、2、1 和 2。在第二种情况下,标准序列的升调具有不熟悉的音程(0.8、1.15、1.59 和 2.22 st)。正如预期的那样,两种情况下的偏差都产生了 MMN。感兴趣的结果是 MMN 在第一个条件下明显更大,它使用文化上熟悉的间隔。重要的是,音乐家和非音乐家都表现出这种效果(尽管音乐家表现出更短的 MMN 发作潜伏期)。
If the MMN were sensitive only to physical differences between stimuli, it would be of little interest to cognitive science. There is evidence, however, that the MMN is sensitive to learned sound categories. For example, Brattico et al. (2001) used a sequence of 5 short tones ascending in pitch as the standard stimulus. The deviant stimulus was the same sequence with the third tone raised in pitch above all the other tones, creating a salient change in the melodic contour. In one condition, the standard sequence conformed to the first 5 tones of a major scale, with semitone intervals of 2, 2, 1, and 2 between successive tones. In a second condition, the ascending tones of the standard sequence had unfamiliar intervals (0.8, 1.15, 1.59, and 2.22 st). The deviant in both conditions produced an MMN, as expected. The result of interest was that the MMN was significantly larger in the first condition, which used culturally familiar intervals. Importantly, both musicians and nonmusicians showed this effect (though the musicians showed a shorter latency of onset for the MMN).
Trainor、McDonald 和 Alain(2002 年;参见 Fujioka 等人,2004 年)还进行了一项 MMN 研究,其中非音乐家表现出对音程的敏感性。听众听到一个重复的 5 音符序列,每次重复都被移调(即,起始音高不同但音程模式相同)。在某些重复中,最终间隔的大小发生了变化,从而保留了序列的整体轮廓。因此,间隔模式是唯一区分序列的东西。非音乐家对改变的最终音程产生了强大的 MMN,即使他们忽略了声音,这表明他们的听觉系统对音乐中的音程结构敏感,这可能是由于多年接触音乐所致。
Trainor, McDonald, and Alain (2002; cf. Fujioka et al., 2004) also conducted an MMN study in which nonmusicians showed sensitivity to musical intervals. Listeners heard a repeating 5-note sequence that was transposed on every repetition (i.e., the starting pitch was different but the pattern of intervals was the same). On some repetitions, the final interval was changed in size in a manner such that the overall contour of the sequence was preserved. Thus the interval pattern was the only thing that distinguished the sequences. The nonmusicians produced a robust MMN to the altered final interval, even though they were ignoring the sounds, suggesting that their auditory systems were sensitive to interval structure in music, presumably due to years of exposure to music.
尽管上述研究与大脑将文化上熟悉的音程表示为习得的声音类别的概念一致,但最好使用 MMN 以更直接的方式测试这一想法。具体来说,可以进行研究,其中标准是一个音程(例如,小三度,300 音分),而偏差是一个音程不同的大小。偏差的大小可以在实验的不同块中以小的步长进行更改,以便在某个时候它跨越一个区间类别(例如,350 到 450 美分,步长为 20 美分)。然后可以将 MMN 测量为频率距离的函数。感兴趣的问题是 MMN 的大小是否简单地随频率差异线性增长,或者 MMN 大小是否随着间隔进入新类别而跳跃。如果是这样,这将是神经证据,表明音高间隔在感知中充当声音类别。已经进行了11项使用音程的 MMN 研究,表明此类研究的可行性(Paavilainen 等,1999)。
Although the above studies are consistent with the notion that the brain represents culturally familiar musical intervals as learned sound categories, it would be desirable to use the MMN to test this idea in a more direct fashion. Specifically, one could conduct studies in which the standard was a musical interval (e.g., a minor third, 300 cents) and the deviant was an interval of different size. The size of the deviant could be altered in small steps across different blocks of the experiment, so that at some point it crosses an interval category (e.g., 350 to 450 cents in steps of 20 cents). One could then measure the MMN as a function of frequency distance. The question of interest is if the size of the MMN simply grows linearly with frequency difference, or if there is a jump in MMN size as the interval crosses into a new category. If so, this would be neural evidence that pitch intervals were acting as sound categories in perception.11 MMN studies using musical intervals have already been conducted that suggest the feasibility of such a study (Paavilainen et al., 1999).
我们现在从音高转向音色。从美学的角度来看,音色可以说是与音高一样重要的音乐感知特征。(例如,想象一下,在真正的萨克斯管和便宜的键盘合成器上熟练地演奏爵士乐民谣在审美和情感影响上的差异。)然而,从认知的角度来看,音色与音高截然不同,因为前者很少由个别乐器产生的有组织的声音对比的基础。当然,音色之间的对比乐器非常有条理,并被来自不同文化的作曲家以系统的方式使用,例如,在西方、爪哇和非洲的乐器合奏中。这方面的一项著名研究是 Cogan (1984) 的研究,他对交响音乐进行了频谱分析,并提出了一种受语言语音学启发的音色对比理论(参见 Cogan & Escot,1976)。12然而,突出的一点是,一种文化的乐器中很少有有组织的音色对比系统。为什么会这样?在解决这个问题之前,重要的是回顾一些关于音色的基本信息。
We turn now from pitch to timbre. From an aesthetic standpoint, timbre is arguably as important as pitch as a perceptual feature of music. (Imagine, for example, the difference in the aesthetic and emotional impact of a jazz ballad expertly played on a real saxophone vs. on a cheap keyboard synthesizer.) From a cognitive standpoint however, timbre differs sharply from pitch in that the former is rarely the basis for organized sound contrasts produced by individual instruments. Of course, timbral contrasts between instruments are quite organized, and are used in systematic ways by composers from numerous cultures, for example, in Western, Javanese, and African instrument ensembles. A notable study in this regard is that of Cogan (1984), who conducted spectral analyses of symphonic music and proposed a theory of timbral contrasts inspired by linguistic phonetics (cf. Cogan & Escot, 1976).12 However, the salient point is that organized systems of timbral contrasts within instruments of a culture are rare. Why is this so? Before addressing this question, it is important to review some basic information about musical timbre.
音色,或音质,通常被定义为一种声音与具有相同音调、持续时间和响度的其他声音区别开来的方面。例如,当音调、响度和持续时间相同时,音色是区分小号和长笛演奏相同音调的声音的因素。这个定义不是很令人满意,因为它没有说明什么是音色,只是它不是什么。打个比方,想象一下用四种品质来描述人脸:身高、宽度和肤色,以及“长相”,其中“长相”的意思是捕捉“是什么让一个人的脸与另一个人的脸不同,当面孔匹配时高度、宽度和肤色。” “长相”显然不是指一个单一的物理维度,而是一个由许多不同特征相互作用产生的整体质量的标签(在一张脸上,这可能包括鼻子的形状,眉毛的粗细, ETC。)。
Timbre, or sound quality, is usually defined as that aspect of a sound that distinguishes it from other sounds of the same pitch, duration, and loudness. For example, timbre is what distinguishes the sound of a trumpet from the sound of a flute playing the same tone, when pitch, loudness, and duration are identical. This definition is not very satisfying, as it does not address what timbre is, only what it is not. To make an analogy, imagine describing human faces in terms of four qualities: height, width, and complexion, and “looks,” in which “looks” is meant to capture “what makes one person’s face different from another, when faces are matched for height, width, and complexion.” “Looks” clearly does not refer to a unitary physical dimension, but is a label for an overall quality created by the interplay of a number of different features (in a face, this might include the shape of the nose, the thickness of the eyebrows, etc.).
对影响音乐音色的许多特征的回顾超出了本书的范围(有关此类回顾,请参见 Hajda 等人,1997 年)。在这里,我重点关注与比较音乐和语音相关的两个重要因素:声音的时间和频谱轮廓。时间曲线是指声音振幅的时间演变,而频谱曲线是指构成声音的频率分布,以及它们的相对振幅(通常称为声音的频谱)。例如,钢琴音调的时间曲线具有尖锐的起音和快速的衰减,使其具有打击乐的品质。相比之下,小提琴以连奏方式演奏的音调的起始和偏移要平缓得多(图 2.5)。
A review of the many features that influence musical timbre is beyond the scope of this book (for one such review, see Hajda et al., 1997). Here I focus on two prominent factors that are relevant for comparing music and speech: the temporal and spectral profile of a sound. The temporal profile refers to the temporal evolution of the amplitude of a sound, whereas the spectral profile refers to the distribution of frequencies that make up a sound, as well as their relative amplitudes (commonly referred to as the spectrum of a sound). For example, the temporal profile of a piano tone has a sharp attack and a rapid decay, giving it a percussive quality. In contrast, the onset and offset of a tone played in a legato manner by a violin are much more gradual (Figure 2.5).
时间包络在声音音色感知中的重要性可以通过播放一小段时间倒退的钢琴音乐来轻松证明,如声音示例 2.2 所示:结果听起来像另一种乐器。为了说明声音的频谱轮廓,请考虑图 2.6,它显示了单簧管和小号演奏单个音符的频谱。
The importance of the temporal envelope in the perception of a sound’s timbre can easily be demonstrated by playing a short passage of piano music backward in time, as in Sound Example 2.2: The result sounds like a different instrument. To illustrate the spectral profile of a sound, consider Figure 2.6, which shows the spectrum of a clarinet and a trumpet playing a single note.
图 2.5钢琴音色(上图)和小提琴音色(下图)的声学波形。请注意钢琴音调的尖锐起音和快速衰减与小提琴音调的逐渐起音和衰减。
Figure 2.5 Acoustic waveform of a piano tone (top panel) and a violin tone (bottom panel). Note the sharp attack and rapid decay of the piano tone vs. the gradual attack and decay of the violin tone.
这些频谱之间的一个显着区别是,在单簧管的情况下,频谱主要由基音的奇数倍的分音(即分音 1、3、5 等,其中 1 是基音)支配。相比之下,喇叭频谱没有这种不对称的频率结构。
A salient difference between these spectra is that in the case of the clarinet, the spectrum is dominated by partials that are odd number multiples of the fundamental (i.e., partials 1,3,5, etc., in which 1 is the fundamental). In contrast, the trumpet spectrum does not have this asymmetric frequency structure.
一方面是光谱图片,如图2.6, 在音色方面可能会产生误导。这些图片是静态的,可能被认为暗示每一种乐器都有一个能产生其特有音色的光谱。事实上,乐器的频谱曲线取决于演奏的音符及其产生的音量。此外,由于声学反射,给定音符的频谱轮廓可能在房间的不同点处发生变化。然而,由于这些光谱变化,人类不会察觉到音色的剧烈变化(Risset & Wessel,1999)。这导致人们对光谱的动态和相关方面越来越感兴趣,希望能出现更多不变的特征。例如,铜管小号特有音色的两个动态特征是音符开始时低泛音与高泛音的振幅建立得更快,图 2.7)。
There is one respect in which pictures of spectra, such as in Figure 2.6, can be misleading with regard to timbre. Such pictures are static, and might be taken to imply that each instrument has a spectrum that yields its characteristic timbre. In fact, the spectral profile of an instrument depends on the note being played and how loudly it is being produced. Furthermore, the spectral profile of a given note can vary at different points in a room because of acoustic reflections. Yet humans do not perceive dramatic changes in timbre because of these spectral changes (Risset & Wessel, 1999). This has led to a growing interest in dynamic and relational aspects of spectra, in the hope that more invariant features will emerge. For example, two dynamic features that underlie the characteristic timbre of a brass trumpet are the faster buildup of amplitude in lower versus higher harmonics as a note starts, and an increase in the number of harmonics as intensity increases within a note (Risset & Wessel, 1999, cf. McAdams et al., 1999; Figure 2.7).
转向音乐音色感知问题,该领域的实证研究严重依赖于相似性判断。通常,此类研究会成对呈现乐器的声音(例如,圆号演奏一个音符,随后单簧管演奏相同的音符),并要求听众评价这两种声音的相似程度。在对许多这样的对进行评级之后,使用多维缩放等统计技术对相似性数据进行转换,以试图揭示听众的感知维度,以判断乐器之间的音色对比(Grey,1977)。此类研究的一个显着发现是,捕获数据中的大部分差异所需的维度相对较少,至少在使用西方管弦乐队的乐器时是这样。例如,在 McAdams 等人的一项研究中。(1995), 出现了三个主要维度,涉及振幅包络的上升时间、频谱质心(即频率分量的振幅加权平均值)和频谱通量(衡量频谱轮廓形状随时间变化的程度)单音内的时间)。因此,感知研究证实了音色比光谱剖面的静态快照更重要(参见 Caclin 等人,2005)。
Turning to the issue of musical timbre perception, empirical research in this area has relied heavily on similarity judgments. Typically, studies of this sort present the sounds of instruments in pairs (e.g., a French horn playing a note followed by a clarinet playing the same note), and ask listeners to rate how similar the two sounds are. After rating many such pairs, the similarity data are transformed using statistical techniques such as multidimensional scaling in an attempt to reveal the listener’s perceptual dimensions for judging timbral contrasts between instruments (Grey, 1977). One notable finding of such studies is that relatively few dimensions are needed to capture most of the variance in the data, at least when instruments from Western orchestras are used. For example, in a study by McAdams et al. (1995), three primary dimensions emerged, relating to the rise time of the amplitude envelope, the spectral centroid (i.e., the amplitude-weighted mean of the frequency components), and the spectral flux (a measure of how much the shape of the spectral profile changes over time within a single tone). Thus perceptual research confirms the point that there is more to timbre than a static snapshot of a spectral profile (cf. Caclin et al., 2005).
为什么音色对比很少作为音乐音响系统的基础?在第 2.2.2 节中,我认为音高作为声音类别的基础具有优势(相对于响度),因为它在感知上是多维的。然而,正如上一节末尾所指出的,音色也具有这种品质。此外,许多乐器可以产生显着的音色对比。例如,大提琴可以根据拉弓方式产生不同的音色。具体来说,靠近琴桥 (sul ponticello) 拉弓会产生更明亮的音质,靠近指板 (sul tasto) 拉弓会产生更暗、更圆润的音质。这两种音色都不同于用弓的木制部分而不是头发敲击琴弦所产生的声音 (col legno)。通过演奏和声,或通过弹拨而不是弹奏来提供更多的音色可能性。一位才华横溢的大提琴演奏者可能能够根据不同的鞠躬方式展示其他几种不同的音色(参见 Randel,1978;“鞠躬”)。然而,尽管有这些可能的音色范围,但普通的大提琴音乐并不是主要围绕音色对比来组织的。尽管有一些值得注意的例外,但对于大多数乐器来说,这种观察都是正确的。例如,简单的犹太人竖琴的音乐强调快速的音色对比,而澳大利亚的迪吉里杜管则以音色是声音组织的主要原则而闻名。
Why do timbral contrasts rarely serve as the basis for musical sound systems? In section 2.2.2, I argued that pitch had an advantage (over loudness) as the basis for sound categories because it was perceptually multidimensional. Yet as noted at the end of the preceding section, timbre also has this quality. Furthermore, many musical instruments can produce salient timbral contrasts. A cello, for example, can produce a variety of timbres depending on how it is bowed. Specifically, a brighter sound quality results from bowing close to the bridge (sul ponticello) and a darker, mellower quality from bowing near the fingerboard (sul tasto). Both of these timbres are distinct from the sound produced by striking the strings with the wooden part of the bow instead of the hair (col legno). Further timbral possibilities are provided by playing harmonics, or by plucking instead of bowing. A talented cello player would likely be able to demonstrate several other distinct timbres based on different manners of bowing (cf. Randel, 1978; “bowing”). Despite this range of possible timbres, however, normal cello music is not organized primarily around timbral contrasts. This observation is true for most musical instruments, despite a few notable exceptions. For example, the music of the simple Jew’s harp emphasizes rapid timbral contrasts, and the Australian didgeridoo is renowned as an instrument in which timbre is the primary principle of sonic organization.
图 2.6演奏音调 F3(基频 ≈ 175 Hz)的单簧管(上)和小号(下)的频谱。请注意单簧管频谱的前 10 个分音中奇数分音的主导地位。(注意:y 轴是对数的)。由 James Beauchamp 提供。
Figure 2.6 Spectrum of a clarinet (top) and a trumpet (bottom) playing the tone F3 (fundamental frequency ≈ 175 Hz). Note the dominance of the odd-numbered partials among the first 10 partials of the clarinet spectrum. (Note: y axis is logarithmic). Courtesy of James Beauchamp.
图 2.7小号前 20 个分音(即基音和 19 个泛音)的振幅动态。请注意,与音调早期的 3-10 相比,分音 1 和 2 的振幅增长更快,而分音 11-20 的能量不足,直到 ~200 毫秒。由 James Beauchamp 提供。
Figure 2.7 Amplitude dynamics of the first 20 partials of a trumpet tone (i.e., the fundamental and 19 harmonics). Note the faster buildup of amplitude in partials 1 and 2 versus 3-10 early in the tone, and the lack of energy in the partials 11-20 until ~200 ms. Courtesy of James Beauchamp.
我相信音色很少被用作音乐中有组织的声音对比的基础,这有物理和认知上的原因。物理原因是音色的显着变化通常需要乐器的激励方式(例如,如何敲击或吹奏)或乐器本身的几何形状和共振特性发生一些变化。对于许多仪器来说,这些参数中的任何一个的快速变化都是困难的或简单的不可能的。(请注意,犹太人的竖琴使用口腔作为共鸣器,它可以通过改变形状来快速改变其共鸣特性。)认知原因是音色对比没有组织在一个有序的感知距离系统中,例如,在“音色间隔”方面(参见 Krumhansl,1989)。在音高中,有一个音程系统可以让更高层次的关系出现。例如,从 C 和 G 的移动可以被认为与从 A 到 E 的移动大小相似:音高不同但音程相同。
I believe that there are both physical and cognitive reasons why timbre is rarely used as the basis for organized sound contrasts in music. The physical reason is that dramatic changes in timbre usually require some change in the way the instrument is excited (e.g., how it is struck or blown) or in the geometry and resonance properties of the instrument itself. For many instruments, rapid changes in either of these parameters are difficult or simply impossible. (Note that the Jew’s harp uses the mouth cavity as a resonator, which can change its resonant properties quickly by changing shape.) The cognitive reason is that timbral contrasts are not organized in a system of orderly perceptual distances from one another, for example, in terms of “timbre intervals” (cf. Krumhansl, 1989). In pitch, having a system of intervals allows higher-level relations to emerge. For example, a move from C and G can be recognized as similar in size as a move from A to E: The pitches are different but the interval is the same.
Ehresman 和 Wessel (1978) 以及 McAdams 和 Cunibile (1992) 调查了听众是否能够听到“音程”。此处讨论后一项研究。McAdams 和 Cunibile 将音色间隔定义为二维感知空间中两点之间的向量,其中起音时间和频谱质心是两个维度(该空间源自对乐器之间音色相似性的感知研究,如前文所述部分;见图 2.8)。
Ehresman and Wessel (1978) and McAdams and Cunibile (1992) have investigated whether listeners are capable of hearing “timbre intervals.” The latter study is discussed here. McAdams and Cunibile defined a timbre interval as a vector between two points in a two-dimensional perceptual space in which attack time and spectral centroid were the two dimensions (this space is derived from perceptual research on timbral similarities between instruments, as discussed in the previous section; see Figure 2.8).
该向量表示沿每个潜在感知维度的变化程度。因此,在图 2.8中,长号和“吉他”(吉他和单簧管的混合体)音色之间的差异定义了一个音色音程。从另一个声音开始并在相似的音程中移动需要在与原始向量相同方向和幅度的音色空间中移动。如图 2.8所示通过从“颤音琴”(颤音琴和长号的混合体)到竖琴的转变。McAdams 和 Cunibile 向听众展示连续的一对声音(例如,AB,然后是 CD,其中每个字母代表一个声音),并询问在哪种情况下从 A 到 B 的变化最类似于从 C 到 D 的变化。他们发现他们提出的音色间隔有一些证据,但也发现听众之间存在很大差异,以及上下文敏感性(即依赖于形成间隔的特定声音,而不仅仅是它们的矢量距离)。他们的研究提出了这样一种可能性,即音色关系没有被足够统一地感知,无法为作曲家和听众之间的共享类别系统提供基础,类似于音高提供的音程系统(参见 Krumhansl,1989)。
The vector indicates the degree of change along each underlying perceptual dimension. Thus in Figure 2.8, the difference between the timbres of a trombone and a “guitarnet” (a hybrid of guitar and clarinet) defines one timbre interval. Starting at another sound and moving through a similar interval requires a move in this timbre space of the same direction and magnitude as the original vector. This is illustrated in Figure 2.8 by a move from a “vibrone” (a hybrid of a vibraphone and a trombone) to a harp. McAdams and Cunibile presented listeners with successive pairs of sounds (e.g., AB, followed by CD, in which each letter represents a sound), and asked in which case the change from A to B most resembled the change from C to D. They found some evidence for their proposed timbre intervals, but also found a good deal of variation among listeners, as well as context sensitivity (i.e., a dependence on the particular sounds that formed the interval rather than just on their vector distance). Their research raises the possibility that timbre relations are not perceived with enough uniformity to provide a basis for a shared category system among composers and listeners akin to the interval system provided by musical pitch (cf. Krumhansl, 1989).
图 2.8音色间隔。首字母缩略词是指由 McAdams 和 Cunibile (1992) 研究的真实或合成乐器,排列在二维音色空间中。音色音程由该空间中特定方向和幅度(即矢量)的移动组成,如 TBN 和 GTN 之间(长号和吉他)或 VBN 和 HRP 之间(颤音琴和竖琴)。来自麦克亚当斯,1996 年。
Figure 2.8 Timbre intervals. Acronyms refer to real or synthetic instruments studied by McAdams and Cunibile (1992), arrayed in a two-dimensional timbre space. A timbre interval consists of a move of a particular direction and magnitude (i.e., a vector) in this space, as between TBN and GTN (trombone and guitarnet) or between VBN and HRP (vibrone and harp). From McAdams, 1996.
Arnold Schoenberg (1911:470-471) 曾经表达过对基于音色的音乐的渴望:
Arnold Schoenberg (1911:470-471) once articulated a desideratum for a timbrebased music:
如果有可能根据音调不同的声音制作作曲结构,我们称之为旋律的结构……那么也必须有可能根据……音色创建这样的序列。这样的序列将以一种内在的逻辑运作,相当于在基于音高的旋律中有效的那种逻辑……所有这一切似乎是对未来的幻想,它可能是。但我坚信它是可以实现的。
If it is possible to make compositional structures from sounds which differ according to pitch, structures which we call melodies … then it must also be possible to create such sequences from … timbre. Such sequences would work with an inherent logic, equivalent to the kind of logic which is effective in the melodies based on pitch.… All this seems a fantasy of the future, which it probably is. Yet I am firmly convinced that it can be realized.
鉴于自勋伯格时代以来音乐的许多创新(包括计算机音乐,它消除了物理乐器的限制),一个奇怪的事实是“音色旋律” (Klangfarbenmelodie)并没有成为西方文化音乐景观的共同特征。如上所述,基于音色的音乐可能在西方没有取得成功,因为很难根据音程或音阶来组织音色。
Given the many innovations in music since Schoenberg’s time (including computer music, which eliminates the constraints of physical instruments), it is a curious fact that “timbre melody” (Klangfarbenmelodie) has not become a common feature on the musical landscape in Western culture. As discussed above, it may be that timbre-based music has not succeeded in the West because of the difficulty of organizing timbre in terms of intervals or scales.
然而,还有其他组织音色系统的方法。毕竟,语音的存在表明丰富的声音系统可以基于与音程或音阶无关的音色对比。因此,一个成功的基于音色对比的音乐传统使用了一个与语音中的音色组织非常相似的系统,这可能并非巧合。这是北印度以塔布拉鼓为基础的鼓乐传统(图 2.9)。
There are, however, other ways to organize timbral systems. After all, the existence of speech shows that rich sound systems can be based on timbral contrasts that have nothing to do with intervals or scales. Thus it likely no coincidence that one successful musical tradition based on timbral contrasts uses a system with strong parallels to the organization of timbre in speech. This is a drumming tradition of North India based on the tabla (Figure 2.9).
手鼓由一对手鼓组成,用于为古典音乐和流行音乐中的乐器演奏家和歌手提供节奏伴奏 (Courtney, 1998; Kippen, 1988)。演奏者坐在地上,用右手演奏较小的高音鼓,用左手演奏较大的低音鼓。击鼓有几种不同的方式:用哪个手指敲击鼓,敲击膜的区域,其他手指在敲击膜时是否使膜变湿,敲击的手指是否从膜上弹开(开放行程)或一直压在膜上敲击后的膜(闭合行程),以及是只敲击一个鼓还是同时敲击两个鼓(见图 2.10)。每个特定的组合都会产生独特的音色。表 2.1提供 8 个塔布拉击弦的详细信息,可以在声音示例 2.3ah 中听到。音色上不同的鼓击的数量大约是 12。(这比不同的声音的数量要少,因为不同的声音有时会根据上下文来命名相同的击球。)
The tabla consists of a pair of hand drums used to provide the rhythmic accompaniment to instrumentalists and singers in classical and popular music (Courtney, 1998; Kippen, 1988). The player, seated on the ground, plays the smaller, high-pitched drum with the right hand and the larger low-pitched drum with the left hand. Drum strokes are distinguished in several ways: by which fingers hit the drum, the region of the membrane struck, whether other fingers damp the membrane while striking it, whether the striking finger bounces off the membrane (open stroke) or is kept pressed to the membrane after the strike (closed stroke), and whether just one drum is struck or both are struck (see Figure 2.10). Each particular combination gives rise to a distinct timbre. Table 2.1 provides details on 8 tabla strokes, which can be heard in Sound Examples 2.3a-h. The number of timbrally distinct drum strokes is about 12. (This is fewer than the number of different vocables because different vocables are sometimes used to name the same stroke depending on context.)
图 2.9北印度塔布拉鼓。达扬琴的音高是根据它所伴奏的旋律乐器演奏的音阶的基本音调来调整的。巴扬的音调调得较低,可以通过用手掌根部对鼓皮施加压力来调节。
Figure 2.9 North Indian tabla drums. The pitch of the dayan is tuned to the basic note of the scale being played on the melodic instrument that it accompanies. The pitch of the bayan is tuned lower, and can be modulated by applying pressure with the heel of the palm to the drum head.
图 2.10 dayan(右)和 bayan(左)的顶视图。每个鼓面都分为三个区域:边缘、头部和中央圆形的铁馅和糊状贴片。还给出了这些地区的印度名称。表面直径:dayan ~14 cm,bayan ~22 cm。
Figure 2.10 Top view of dayan (right) and bayan (left). Each drum surface is divided into three regions: a rim, a head, and a central circular patch of iron fillings and paste. Indian names of the regions are also given. Surface diameters: dayan ~14 cm, bayan ~22 cm.
塔布拉鼓的音色组织与演讲之间的相似之处有两种形式。首先,不同的击鼓方式可以用“发音的位置和方式”矩阵来表示,类似于语音的组织(参见第 2.3.3 节,表 2.3; 比照。钱多拉,1988 年)。例如,用食指敲击右鼓所产生的声音会因敲击鼓的位置(“发音位置”,例如边缘与中心)以及敲击方式(“发音方式”,例如,具有弹跳行程或导致阻尼的闭合行程)。音乐和口语音色之间的第二个相似之处更为明确。每种击鼓都与教学和作曲中使用的特定口头标签(无意义的音节或“vocable”)相关联,因为这是一种口头传统。玩家有一种直觉,即每个笔画与其相关的发音(称为“bol”,来自印地语中的语音)之间存在听觉和感知上的相似性。实证研究证实了这一联系,这很有趣,因为鼓和人声以截然不同的方式发出声音(Patel & Iversen,2003)。这项研究将被更详细地描述(在第 2.3.3 节,“将语言的音色对比映射到音乐声音”小节)在讨论了语音中的音色对比之后。
Parallels between the timbral organization of tabla drumming and speech take two forms. First, the different ways of striking the drum can be represented in terms of matrix of “place and manner of articulation,” akin to the organization of speech sounds (cf. Section 2.3.3, Table 2.3; cf. Chandola, 1988). For example, the sound produced by striking the right drum with the index finger differs dramatically depending where the drum is struck (“place of articulation,” e.g., edge vs. center) and how it is struck (“manner of articulation,” e.g., with a bouncing stroke or with a closed stroke that results in damping). The second parallel between musical and spoken timbre is more explicit. Each kind of drum stroke is associated with a particular verbal label (a nonsense syllable or “vocable”) that is used in teaching and composition, as this is an oral tradition. Players have an intuition that there is an acoustic and perceptual resemblance between each stroke and its associated vocable (called a “bol,” from the Hindi word for speech sound). Empirical research has confirmed this link, which is interesting given that the drum and the voice produce sounds in very different ways (Patel & Iversen, 2003). This research will be described in more detail (in section 2.3.3, subsection “Mapping Linguistic Timbral Contrasts Onto Musical Sounds”) after discussing timbral contrasts in speech.
T$able 2.1 8 个 Tabla Strokes 和 Vocables (Bols) 的详细信息
T$able 2.1 Details of 8 Tabla Strokes and Vocables (Bols)
Tabla 只是在音色模式组织方面与语音相似的几种鼓乐传统之一(例如来自非洲的例子,请参见 Tsukada,1997)。然而,据我所知,就单个演奏者产生的音色对比的多样性和速度而言,塔布拉鼓在广泛传播的音乐传统中是无与伦比的。声音/视频示例 2.4 给出了才华横溢的塔布拉手的节奏感。
The tabla is but one of several drumming traditions that feature parallels to speech in terms of the organization of timbral patterning (for an example from Africa, see Tsukada, 1997). To my knowledge, however, tabla drumming is unsurpassed among widespread musical traditions in terms of the diversity and speed of timbral contrasts produced by a single player. A sense of the rhythmic virtuosity of a talented tabla player is given in Sound/Video Example 2.4.
本章这一部分的目的是讨论语言中有组织的声音对比,讨论它们与音乐声音系统的异同。在开始之前,有必要回顾一下有关语音研究的一些基本概念。
The purpose of this part of the chapter is to discuss organized sound contrasts in language in terms of their similarities and differences to musical sound systems. Before embarking, it is worth reviewing a few basic concepts concerning the study of speech sounds.
语言语音系统的研究分为两个广泛且重叠的领域:语音学和音位学。语音学是研究语音的科学,包括研究语音的声学结构以及产生和感知语音的机制(声学、发音和听觉语音学)。音韵学是对语言声音模式的研究,包括研究语音如何组织成更高层次的单位,如音节和单词,声音如何随着上下文的变化而变化,以及语言声音模式的知识是如何形成的。在说话者或听众的脑海中呈现出来。
The study of linguistic sound systems is divided into two broad and overlapping fields: phonetics and phonology. Phonetics is the science of speech sounds, and includes the study of the acoustic structure of speech and the mechanisms by which speech is produced and perceived (acoustic, articulatory, and auditory phonetics). Phonology is the study of the sound patterns of language, and includes the study of how speech sounds are organized into higher level units such as syllables and words, how sounds vary as a function of context, and how knowledge of the sound patterns of language is represented in the mind of a speaker or listener.
为了说明这两种声音系统方法之间的区别,请考虑英语中的音节重音现象。英语语音中的音节在感知上的突出程度各不相同,即使在没有特别强调特定单词或短语的句子中也是如此。例如,如果要求听众在“the committee will meet for a special debate”中标记突出的音节,大多数听众会认为以下音节是强调的:The committee will meet for a special debate 。一项专注于英语重音和非重音音节之间声学差异的研究(例如,量化感知重音与持续时间或强度增加等参数对应的程度)将是一项语音学研究。相比之下,一项专注于英语单词和句子中重音模式的研究(例如,英语单词中重音的位置,是否存在重音和非重音音节在话语中交替的趋势)将是一项音韵学研究. 这个例子说明了一个事实,在实践中,语音学通常处理连续声学或发音参数的测量,而音位学通常处理明确定义的元素的组织。13
To illustrate the difference between these two approaches to sound systems, consider the phenomenon of syllable stress in English. Syllables in English speech vary in their perceptual prominence, even in sentences spoken without any special emphasis on a particular word or phrase. For example, if a listener is asked to mark the prominent syllables in “the committee will meet for a special debate,” most listeners will perceive the following syllables as stressed: The committee will meet for a special debate. A study that focused on the acoustic differences between stressed and unstressed syllables in English (e.g., quantifying the extent to which perceived stress corresponds to parameters such as increased duration or intensity) would be a study in phonetics. In contrast, a study that focused on the patterning of stress in English words and sentences (e.g., the location of stress in English words, whether there is a tendency for stressed and unstressed syllables to alternate in an utterance) would be a study in phonology. This example illustrates the fact that in practice, phonetics usually deals with the measurement of continuous acoustic or articulatory parameters, whereas phonology usually deals with the organization of categorically defined elements.13
语言语音系统研究中的一个基本概念是音素。音素是可以区分语言中两个不同单词的最小语音单位。例如,在美式英语中,“bit”这个词的元音和“beet”这个词的元音是不同的音素,因为这两个词的意思不同。然而,在不同的语言中,这两个音节的英语发音可能意味着同一件事,表明这两个声音是单个音素的变体(音位同位素)。因此,音素的定义依赖于单词的含义。语言学家使用称为国际音标或 IPA 的标准化系统开发了在书面语中表示音素的有效方法。14
A fundamental concept in the study of linguistic sound systems is the phoneme. A phoneme is the minimal speech unit that can distinguish two different words in a language. For example, in American English, the vowel in the word “bit” and the vowel in the word “beet” are different phonemes, because these words mean different things. In a different language, however, the English pronunciation of these two syllables might mean the same thing, indicating that the two sounds are variants (allophones) of a single phoneme. Thus the definition of a phoneme relies on the meaning of words. Linguists have developed efficient ways to represent phonemes in writing, using a standardized system known as the International Phonetic Alphabet, or IPA.14
另一个关键的语言学概念是语音中的声音结构是按层次组织的,而音素只是这个层次结构中的一个层次。向下一个层次,音素通常被分析为一组独特的特征(雅各布森等人,1952 年;乔姆斯基和哈勒,1968 年)。这种较低层次分析的一个动机是,一种语言的音素不仅仅是一组无序的项目,而且在它们的产生方式方面具有相似性和差异性的关系。例如,/p/ 和 /b/ 这两个音素在许多方面都很相似,都涉及嘴唇闭合后快速释放和声带振动的开始(尽管释放和声带开始之间的间隔/b/ 中的振动,称为语音起始时间,比 /p/ 中的短得多。因此,这两个音素可以被分析为共享许多发音特征,并在其他一些方面有所不同。事实证明,根据特征进行分析在语言学中很有用,因为它可以帮助阐明语音中的某些声音模式,15再往上一层,音素被组织成音节,音节在语言的许多方面都发挥着重要作用,包括说话节奏。音节是围绕元音组织的:一个音节至少是一个元音,通常是一个元音加上前面和/或后面的辅音,在语音产生和语音感知中结合成一个连贯的单元(音节在第 3 章中有更详细的讨论)。16
Another key linguistic concept is that sound structure in speech is hierarchically organized, and that the phoneme is just one level in this hierarchy. Going down one level, phonemes are often analyzed as bundles of distinctive features (Jakobson et al., 1952; Chomsky & Halle, 1968). One motivation for this lower level of analysis is that the phonemes of a language are not simply an unordered set of items, but have relationships of similarity and difference in terms of the way they are produced. For example, the two phonemes /p/ and /b/ are similar in many respects, both involving a closure of the lips followed by a rapid release and the onset of vocal fold vibration (though the interval between release and the onset of vocal fold vibration, referred to as voice onset time, is much shorter in /b/ than in /p/). Thus the two phonemes can be analyzed as sharing a number of articulatory features, and differing in a few others. Analysis in terms of features has proved useful in linguistics because it can help shed light on certain sound patterns in speech, such as why certain phonemes are replaced by others in historical language change.15 Going one level up, phonemes are organized into syllables, which play an important role in many aspects of a language, including speech rhythm. Syllables are organized around vowels: A syllable is minimally a vowel and is typically a vowel plus preceding and/or following consonants, bound into a coherent unit in both speech production and speech perception (syllables are discussed in greater detail in Chapter 3).16
最后一个重点涉及语言多样性。世界上有将近 7,000 种语言17(参见 Comrie 等,1996)。其中一些由少数人使用,而普通话则由超过 10 亿人使用。在语言语音系统中寻找一般模式需要一个基因多样化的语言样本,而不仅仅是今天广泛使用的语言样本(Maddieson,1984)。这一点类似于关于音乐音响系统的概括需要跨文化的视角,而不仅仅是关注广泛传播的音乐。举一个例子,在认知科学文献中,最常用来说明声调语言的语言是普通话。(不知道什么是声调语言的请看下节)普通话有四个声调:1声调是定调的平声,2-4声调是有突出音高运动的等高声调。然而,对不同声调语言的调查表明,只有一个声级在类型学上是非常不寻常的(Maddieson,1999)。这一事实与语言和音乐的比较研究有关,因为具有平调的语言系统是寻找语音和音乐中音高对比之间的密切比较的合乎逻辑的地方。这是下一节的主题。
A final important point concerns linguistic diversity. There are almost 7,000 languages in the world17 (cf. Comrie et al., 1996). Some are spoken by a handful of individuals and one, Mandarin Chinese, by over a billion. The search for general patterns in linguistic sound systems requires a genetically diverse sample of languages, not simply a sample of languages that are widely spoken today (Maddieson, 1984). This point is analogous to the point that generalizations about musical sound systems require a cross-cultural perspective, not just a focus on widely disseminated musics. To take just one example, in the cognitive science literature, the language that is most commonly used to illustrate a tone language is Mandarin. (For those who do not know what a tone language is, see the following section.) Mandarin has four tones: Tone 1 is a level tone of fixed pitch, whereas tones 2-4 are contour tones involving salient pitch movement. A survey of diverse tone languages reveals, however, that having only 1 level tone is typologically very unusual (Maddieson, 1999). This fact is relevant to the comparative study of language and music, because linguistic systems with level tones are the logical place to look for a close comparison between pitch contrasts in speech and music. This is the topic of the next section.
有组织的语言音高对比与有组织的音乐音高对比相比如何?要解决这个问题,首先有必要介绍一下语音中音高的使用背景。
How do the organized pitch contrasts of language compare to the organized pitch contrasts of music? To address this question, it is first necessary to give some background on the use of pitch in speech.
尽管人类能够单调地说话,但他们很少这样做。相反,语音具有显着的音调调制,其最重要的物理关联是声带振动的基频(缩写为 F0 [“F-zero”或“F-nought”];参见第 4 章更多细节)。这种调制远非随机:它结构丰富,传达了各种语言、态度和情感信息('t Hart 等,1990)。情绪引起的音高变化的某些方面是普遍的。例如,快乐与较宽的音高范围相关,而悲伤与较窄的音高范围相关,反映出前一种状态的唤醒程度更高。这是梯度音高对比的一个例子:由于情绪引起的音高范围不是以离散的方式产生或感知的,而是不断变化以反映不断变化的情感状态。
Although humans are capable of speaking on a monotone, they rarely do so. Instead, speech features salient modulation of voice pitch, whose most important physical correlate is the fundamental frequency of vocal fold vibration (abbreviated F0 [“F-zero” or “F-nought”]; see Chapter 4 for more details). This modulation is far from random: It is richly structured and conveys a variety of linguistic, attitudinal, and emotional information (’t Hart et al., 1990). Certain aspects of pitch variation due to emotion are universal. For example, happiness is associated with a wide pitch range and sadness with a narrow pitch range, reflecting the greater degree of arousal in the former state. This is an example of a gradient pitch contrast: Pitch range due to emotion is not produced or perceived in a discrete fashion, but varies continuously to reflect a continuously variable affective state.
这里主要感兴趣的是根据离散类别组织和感知的语言音调对比。这种组织在声调语言方面达到了顶峰。声调语言是一种语言,其中音调与元音和辅音一样是单词身份的一部分,因此改变音调可以完全改变单词的含义。图 2.11显示了说话者用曼比拉语说四个词时的音高图,曼比拉语是西非尼日利亚-喀麦隆边境的一种语言(Connell,2000 年)。这些语调是通过说出与片段内容匹配但语气不同的单词产生的。
Of primary interest here are linguistic pitch contrasts that are organized and perceived in terms of discrete categories. Such organization reaches its pinnacle in tone languages. A tone language is a language in which pitch is as much a part of a word’s identity as are the vowels and consonants, so that changing the pitch can completely change the meaning of the word. Figure 2.11 shows a plot of the pitch of the voice as a speaker says four words in Mambila, a language from the Nigeria-Cameroon border in West Africa (Connell, 2000). These tones were produced by speaking words that were matched for segmental content but differed in tone.
声调语言对于说欧洲语言的人来说似乎很陌生,但事实上世界上超过一半的语言都是声调语言,包括非洲和东南亚的大多数语言(Fromkin,1978)。如果有人有兴趣询问语言音高系统与音乐音高系统的接近程度,这些语言是一个合乎逻辑的地方。研究这些语言有助于区分音乐和语音之间在音调对比结构方面的本质区别。
Tone languages may seem exotic to a speaker of a European language, but in fact over half of the world’s languages are tonal, including the majority of languages in Africa and southeast Asia (Fromkin, 1978). Such languages are a logical place to look if one is interested in asking how close linguistic pitch systems can come to musical ones. Examination of such languages can help isolate the essential differences between music and speech in terms of how pitch contrasts are structured.
图 2.11来自尼日利亚-喀麦隆边境的一种语言曼比拉语具有 4 种不同声调的单词示例。说明这些声调的四个词如下:T1 = mbán(乳房),T2 = bā(袋子),T3 = ba(手掌),T4 = bá(翅膀)。顶部面板显示单词的声学波形,底部面板显示从一个扬声器录制的音高。来自康奈尔,2000 年。
Figure 2.11 Examples of words with 4 different level tones from Mambila, a language from the Nigeria-Cameroon border. The four words illustrating these tones are as follows: T1 = mbán (breast), T2 = bā (bag), T3 = ba (palm of hand), T4 = bá (wing). The top panels shows the acoustic waveforms of the words, and the bottom panel shows voice pitch as recorded from one speaker. From Connell, 2000.
在寻找声调语言来与音乐进行比较时,研究人员自然会被使用平调的语言(例如曼比拉语)而不是等高声调所吸引。如前所述,等高音是一种音高轨迹,不能分解为更基本的单元,如普通话的第 2-4 声所示。相反,水平音调代表水平音调目标。关注普通话可能会让人认为平调在语言中是不寻常的。Maddieson (1978, cf. Maddieson 2005) 进行了一项跨语言调查,结果表明事实恰恰相反:绝大多数声调语言只有级别音调,最常见的级别数是 2,但 3 级音调并不少见。Maddieson 还发现,大多数具有 2 或 3 声调的语言只有平声,而从 3 声调到 4 声调的步骤代表声调语言组织中的一个断点。具体来说,这是轮廓音调开始变得重要的点:这种音调在 3 音系统中很少见,但在 4 音系统中却很常见。因此,不存在仅具有等高音的语言,具有大量音调库存的语言通常对比不超过 3 个级别的音调,并在其余库存中使用等高音。从这些研究中得出的概括是,具有平调的语言通常不会对比超过 3 个音高水平。
In seeking tone languages to compare to music, the researcher is naturally drawn to languages that use level tones (such as Mambila) rather than contour tones. As mentioned previously, a contour tone is a pitch trajectory that cannot be broken down into more basic units, as illustrated by tones 2-4 of Mandarin Chinese. In contrast, a level tone represents a level pitch target. A focus on Mandarin might lead one to think that level tones are unusual in language. Maddieson (1978, cf. Maddieson 2005) has conducted a cross-linguistic survey that reveals that the truth is in fact the reverse: The great majority of tone language have only level tones, with the most common number of levels being 2, though 3 tones is not uncommon. Maddieson also found that the large majority of languages with 2 or 3 tones have only level tones, and that the step from 3 to 4 tones represents a break point in the organization of tone languages. Specifically, this is the point at which contour tones begin to become important: Such tones are rare in 3-tone systems, but quite common in 4-tone systems. Thus there are no languages with only contour tones, and languages with large tonal inventories typically contrast no more than 3 level tones and use contour tones for the rest of their inventory. The generalization that emerges from these studies is that languages with level tones generally do not contrast more than 3 pitch levels.
对语言中的平调及其音调对比了解多少?首先,一种语言中声调的最大数量是 5,具有这么多声级的语言非常罕见 (Maddieson, 1978; Edmondson & Gregerson, 1992)。表 2.2列出了一些具有 5 个平调的语言,而图 2.12显示了对其中一种语言的声调的音位分析。图 2.13显示了具有 5 个声调的非洲语言的地理分布,显示了它们的相对稀有性(Wedekind,1985)。
What is known about level tones in language and their pitch contrasts? First, the maximum number of level tones in a language is 5, and languages with this many levels are very rare (Maddieson, 1978; Edmondson & Gregerson, 1992). Table 2.2 lists some languages with 5 level tones, whereas Figure 2.12 shows a phonological analysis of the tones of one such language. Figure 2.13 shows the geographic distribution of African languages with 5 level tones, showing their relative rarity (Wedekind, 1985).
T$able 2.2 Languages With 5 Level Tones
图 2.12 Ticuna 的语言声调,这是秘鲁、哥伦比亚和巴西亚马逊地区的一小群人使用的声调语言。这种语言已被分析为具有 5 个平调和 7 个滑音。数字表示说话者语音中的相对音高水平。摘自 Anderson,1959,引自 Edmondson & Gregerson,1992。
Figure 2.12 The linguistic tones of Ticuna, a tone language spoken by a small group of people from the Amazon regions of Peru, Columbia, and Brazil. This language has been analyzed as having 5 level tones and 7 glides. The numerals represent relative pitch levels in a speaker’s voice. From Anderson, 1959, cited in Edmondson & Gregerson, 1992.
其次,电平音调以特定方式划分频率空间。Maddieson (1991) 研究了这个主题,他对比了两种不同的假设,即随着语言中平调数量的增加,频率空间如何分配给平调。第一个假设是,基于最大分散的想法,音调将在扬声器的音高范围内尽可能远。因此,具有 2 个电平音调的系统在它们之间具有较宽的间距,并且额外的音调以最大化音间距离的方式细分固定音高空间。第二个假设提出,“相对于说话者的范围,存在一个或多或少的固定间隔,它作为水平之间令人满意的对比程度”(Maddieson,1991 年,第 150 页),
Second, level tones divide up frequency space in a particular way. This topic was investigated by Maddieson (1991), who contrasted two different hypotheses about how frequency space was allocated to level tones as the number of level tones in a language increased. The first hypothesis is that based on the idea of maximal dispersion, tones will be as far apart as they can be in the pitch range of a speaker. Thus a system with 2 level tones has a wide spacing between them, and additional tones subdivide the fixed pitch space in a manner that maximized intertone distance. The second hypothesis proposes that there is a “more-or-less fixed interval, relative to a speaker’s range, which serves as a satisfactory degree of contrast between levels” (Maddieson, 1991, p. 150), and thus a larger number of tones will occupy a larger pitch range than a smaller number.
可用数据支持第二个假设:随着语言中声调数量的增加,音高范围也会增加,这表明语言试图达到的平调之间存在基本的最小间隔。虽然关于平调间音间距的经验数据不多,但现有资料表明,最小的音程在1-2个半音之间,最大的约为4个半音,2-3个半音的音间间距是常见的(Maddieson 1991;Connell,2000;Hogan & Manyeh,1996)。这暗示了为什么人类语言中平调的最大数量是 5。因为音高范围随着声调数量的增加而增加,所以 5 个平调可能对应于可以舒适地产生并且在感知方面可以明确细分的最大范围。
The available data favor the second hypothesis: As the number of tones in a language grows, so does the pitch range, suggesting that there is a basic minimal interval between level tones that languages attempt to attain. Although there is not a great deal of empirical data on tone spacing between level tones, the available data suggest that the minimum interval is between 1 and 2 semitones, the maximum is about 4 semitones, and an intertone spacing of 2-3 semitones is common (Maddieson 1991; Connell, 2000; Hogan & Manyeh, 1996). This suggests why the maximum number of level tones in human languages is 5. Because pitch range increases with the number of tones, 5 level tones may correspond to the maximum range that can be both comfortably produced and clearly subdivided in terms of perception.
为了仔细比较平声语言中的音高对比和音乐中的音高对比,有必要详细分析平声语言中的语音。应该分析哪些语言?在平声语言(也称为语调语言)中,有两个子类。其中,由于各种语言因素,语音的整体音调随着句子的进行而降低(“下降趋势”)(Connell,1999)。在这些语言中,声调的高低仅与其直接相邻的声调有关,因此句尾附近的“高”声调实际上可能比句首附近的“低音”声调低(Hyman,2001) . 显然,此类语言中的音高对比不适合与音乐进行比较。但是,还有其他没有下降趋势的声调语言。这种“离散级”声调语言是比较语言和音乐中音高对比的最佳语言。Welmers (1973:81) 提供了离散声调语言的早期描述:
In order to make a close comparison between pitch contrasts in a level tone language and pitch contrasts in music, it is necessary to analyze speech from level tone languages in detail. Which languages should be analyzed? Among level tone languages (also called register tone languages), there are two subcategories. In one, the overall pitch of the voice lowers as a sentence progresses (“downtrend”) due to various linguistic factors (Connell, 1999). In such languages a tone is high or low only in relation to its immediate neighbors, so that a “high” tone near the end of a sentence may actually be lower in pitch than a “low” tone near its beginning (Hyman, 2001). Clearly, pitch contrasts in such languages are not suitable for comparison with music. However, there are other tone languages that do not have downtrend. Such “discrete level” tone languages are the best languages for comparing pitch contrasts in language and music. An early description of discrete level tone languages is offered by Welmers (1973:81):
图 2.13非洲声调语言按平声数的地理分布。请注意这种模式,即二声语言环绕三声语言,三声语言又环绕四声语言,三声语言又环绕五声语言。这可能暗示了增加或减少音调差异的历史过程。来自韦德金德,1985 年。
Figure 2.13 The geographic distribution of African tone languages according to their number of level tones. Note the pattern whereby 2-tone languages surround 3-tone languages, which in turn surround 4-tone languages, which in turn surround 5-tone languages. This may suggest a historical process of increasing or decreasing tonal differentiation. From Wedekind, 1985.
在许多这样的语言中,每个电平音都被限制在一个相对狭窄的绝对音高范围内(对于给定环境下的给定说话者的绝对值)条件)在一个短语中,并且这些音位范围是离散的——从不重叠,并且被未使用的音高范围分隔——贯穿整个短语,尽管它们可能在短语的最后以一个简短的最终轮廓向下倾斜。因此,在三级系统中,靠近乐句末尾的高音实际上与乐句开头的高音具有相同的绝对音高,并且高于乐句中的任何中音。通常在音调序列中几乎没有限制。这些现象可以用 Jukun(尼日利亚)中的以下句子来说明:
In many such languages, each level tone is restricted to a relatively narrow range of absolute pitch (absolute for a given speaker under given environmental conditions) within a phrase, and these tonemic ranges are discrete—never overlapping, and separated by pitch ranges which are not used—throughout the phrase, though they may all tilt downward at the very end of the phrase in a brief final contour. Thus, in a three-level system, high tone near the end of the phrase has virtually the same absolute pitch as a high tone at the beginning of the phrase, and is higher than any mid tone in the phrase. Usually there are few restrictions in tone sequences. These phenomena may be illustrated by the following sentence in Jukun (Nigeria):
两个连续声调的所有可能序列都出现在这个句子中。这三个级别在整个句子中是离散的,并且精确地限制在钢琴上的三个音符上演奏它们(大三和弦非常好)不会明显扭曲正常语音的音调。
Every possible sequence of two successive tones occurs in this sentence. The three levels are discrete throughout the sentence, and so precisely limited that playing them on three notes on a piano (a major triad does very well) does not appreciably distort the pitches of normal speech.
(注意:“大三和弦”由三个音组成,其中较高的两个比最低音高 4 和 7 个半音,例如,钢琴上的 CEG。) Welmers 的报告具有挑衅性,特别是因为他暗示了稳定的音高间隔。他还指出,聚坤语不是孤立的案例,而是许多西非和中非语言中的一种,具有 2 到 4 个声调的离散声级系统。不过,应该指出的是,Welmers 在 easy F0 extraction 之前的日子里写过,他的描述还没有得到经验数据的验证。
(NB: A “major triad” consists of three tones, the upper two of which are 4 and 7 semitones above the lowest tone, e.g., C-E-G on a piano.) Welmers’ report is provocative, particularly because he implies stable pitch intervals. He also notes that Jukun is not an isolated case, but one of many West and Central African languages with discrete level tone systems of 2 to 4 tones. It should be noted, though, that Welmers wrote in the days before easy F0 extraction, and that his descriptions have yet to be verified by empirical data.
对巨昆等语言的研究将使我们能够看到语音在音高对比方面与音乐的接近程度。特别是,通过在这些语言的句子中标记声调并检查它们之间的频率距离,可以询问是否有稳定间隔的证据。(当然,应该记住,音乐的音程受上下文变化的影响;参见 Rakowski,1990 年。因此,在比较音乐与语音的音程稳定性时,兴趣的比较是程度语境变化。)有理由怀疑语音中的音调间距会有很大的灵活性。回想一下,语音中的音高有双重作用,既承载情感信号又承载语言信号。因此,根据音调之间的物理距离使语音中的音高对比灵活是自适应的。这样,音调对比可以适应由于情感因素或其他以梯度方式改变音高范围的因素(例如说话的响度)而导致的整体音高范围的弹性变化(Ladd,1996:35)。
The study of languages such as Jukun would allow us to see how close speech can come to music in terms of pitch contrasts. In particular, by labeling tones in sentences of such languages and examining the frequency distances between them, one could ask if there is any evidence for stable intervals. (Of course, it should be kept in mind that intervals in music are subject to contextual variation; cf. Rakowski, 1990. Thus in comparing interval stability in music vs. speech, the comparison of interest would be the degree of contextual variation.) There is reason to suspect that there will be a great deal of flexibility in tone spacing in speech. Recall that pitch in speech does double duty, carrying both affective signals and linguistic ones. Thus it would be adaptive to make pitch contrasts in speech flexible in terms of the physical distances between tones. This way, tonal contrasts could accommodate to elastic changes in the overall pitch range due to affective factors, or due to other factors that change pitch range in a gradient way, such as the loudness with which one speaks (Ladd, 1996:35).
Ladd 和他的一些学生对音高范围的研究支持了语音中音高间距的灵活性(Ladd,即将出版)。基于这项工作,Ladd 建议语音音调相对于两个参考频率进行缩放,这两个参考频率对应于个人说话范围的顶部和底部。这个范围在扬声器之间可能不同,并且在扬声器内可以是弹性的,因为例如,在大声说话或具有强烈的积极影响时成长。Ladd 认为,在不同的上下文和说话者中保持相对恒定的是音高水平与当前范围的比例。(请参阅本章附录 4的相关方程式;Earle,1975 年也阐明了这个想法。)图 2.14使用基于具有四个声调的语言的假设示例说明了这个想法:低、中低、中高和高的。
The flexibility of pitch spacing in speech is supported by a research on pitch range by Ladd and a number of his students (Ladd, forthcoming). Based on this work, Ladd suggests that speech tones are scaled relative to two reference frequencies corresponding to the top and bottom of an individual’s speaking range. This range can vary between speakers, and can be elastic within a speaker, for example, growing when speaking loudly or with strong positive affect. Ladd argues that what stays relatively constant across contexts and speakers is pitch level as a proportion of the current range. (See this chapter’s appendix 4 for the relevant equation; this idea was also articulated by Earle, 1975.) Figure 2.14 illustrates this idea using a hypothetical example based on language with four level tones: low, low-mid, high-mid, and high.
在图 2.14中,说话者 A、B 和 C 发出相同的音调。说话者 A 的频率范围为 100 到 200 Hz。(这些频率可能是从说话者以引用形式说出的单词中测得的,如图 2.11 所示)。比方说,说话者 B 是一位声音低沉的男性(范围 60 到 120 赫兹),而说话者 C 的声音范围窄,在 100 到 133 赫兹之间。(窄音域可能是由于悲伤等情绪因素造成的;Scherer,1986 年。)请注意,在每种情况下,音调的电平与说话者音域的比例是相同的:低音和高音位于范围的底部/顶部,而中音和中高音是从底部到顶部的 20% 和 50%。
In Figure 2.14, the same tones are spoken by Speakers A, B, and C. Speaker A’s frequencies range from 100 to 200 Hz. (These frequencies might have been measured from the speaker saying words in citation form, as in Figure 2.11). Speaker B is, say, a male with a deep voice (range 60 to 120 Hz), whereas Speaker C has a narrow range, between 100 and 133 Hz. (A narrow range can be due to affective factors, such as sadness; Scherer, 1986.) Note that in each case, the level of a tone as a proportion of the speaker’s range is the same: The low and high tones are at the bottom/top of the range, whereas the mid and mid-high tones are 20% and 50% of the way from the bottom to the top.
对于目前的讨论,该方案值得注意的是音调的绝对频率和音调之间的音高间隔(以半音为单位)因说话者而异。因此,说话者 A 和 B 具有相同的音高间隔,但这只是因为以半音(在本例中为 1 个八度)测量时,它们具有相同的音高范围。对于这两个扬声器,音调之间相隔大约 3、4 和 5 个半音,因为音调之间按升序排列。但是,对于具有较窄音高范围(约 5 个半音)的说话者 C,上升音程约为 1、1.5 和 2 个半音。
For the present discussion, what is notable about this scheme is that both the absolute frequency of tones and the pitch intervals between tones (in semitones) vary between speakers. Thus Speakers A and B have identical pitch intervals, but this is only because they have the same pitch range when measured in semitones (in this case, 1 octave). For both of these speakers, tones are separated by approximately 3, 4, and 5 semitones as one goes between tones in ascending order. However, for Speaker C, who has a narrow pitch range (about 5 semitones) the ascending intervals are about 1, 1.5, and 2 semitones.
图 2.14语言声调间距示例说明了基于范围的比例缩放的概念。详情见正文。
Figure 2.14 Examples of linguistic tone spacing illustrating the notion of range-based proportional scaling. See text for details.
Ladd 认为,当音高范围在话语过程中发生变化时,这种基于范围的语言音高水平缩放也适用于话语。事实上,作为 F0 偏角的一部分,音高范围在话语过程中缩小是很常见的(参见 Vaissiere,1983)。因此,例如,说话者可能以大约 1 个八度音程的范围开始一个长句子,但以仅 11 个半音的范围结束它。根据 Ladd 的方案,音调之间的间隔会随着话语的进行而缩小。尽管 Welmers 的描述表明离散声调语言可能不受这种下降的影响,但需要实证研究来证明事实是否如此。然而,即使这些语言没有表现出偏角,测量结果似乎很可能表明离散声调语言不使用固定的音高间隔。原因之一是音高范围的弹性,这反过来又受到影响和说话的响度等因素的影响。另一个是在相关话语中,词汇声调受上下文力量(如发音迟缓)的影响,可以改变它们的表面形式(Xu,2006)。
Ladd argues that this range-based scaling of linguistic pitch levels applies within utterances as well, when pitch range changes over the course of an utterance. In fact, it is quite common for pitch range to shrink over the course of an utterance as part of F0 declination (cf. Vaissiere, 1983). Thus, for example, a speaker may start a long sentence with a range of about 1 octave but end it with a range of only 11 semitones. According to Ladd’s scheme, the intervals between tones will shrink over the course of the utterance. Although Welmers’ description suggests that discrete level tone languages may be immune from this sort of declination, empirical work is needed to show if this is in fact the case. However, even if such languages do not show declination, it seems very likely that measurements will show that discrete level tone languages do not use fixed pitch intervals. One reason for this is the elasticity of the pitch range, which is in turn driven by factors such as affect and the loudness with which one speaks. Another is that in connected discourse, lexical tones are subject to contextual forces (such as articulatory sluggishness) that can change their surface form (Xu, 2006).
在结束本节之前,读者可能有兴趣听听聚坤的样本。声音示例 2.5 展示了 Jukun(Wukari 方言,Welmers 可能提到的方言)中的一个简短的口头段落。18这段话是从一个叫做“北风和太阳”的故事翻译而来的,为了比较语音分析的目的,这个故事被用多种语言记录下来。当然,这仅表示单个扬声器,并且仅作为示例提供。
Before closing this section, it might interest the reader to hear a sample of Jukun. Sound Example 2.5 presents a short spoken passage in Jukun (Wukari dialect, the dialect probably referred to by Welmers).18 This passage is from a translation of a story called “The North Wind and the Sun,” which has been recorded in many languages for the purpose of comparative phonetic analysis. Of course, this simply represents a single speaker, and is provided simply as an illustration.
上一节介绍了这样一种观点,即语言声调作为声音类别的身份的关键是它作为说话者音高范围的百分比的位置。这表明语言声调的产生和感知是一个相对的问题:声调的特性取决于它在音高范围内的位置,这个范围可以从说话者到说话者甚至在单个说话者的话语中改变(参见 Wong & Diehl, 2003,相关经验数据)。如果这种观点是正确的,那么听众必须有某种方法可以相对快速和准确地估计给定音调在扬声器音高范围内的位置。有趣的是,由于语音质量因素会根据说话者音高范围内的位置而变化,因此即使是孤立的音节,听众也可能做出这样的判断(Honorof & Whalen,2005)。
The preceding section introduced the idea that a key to a linguistic tone’s identity as a sound category is its position as a percent of a speaker’s pitch range. This suggests that linguistic tone production and perception is a relative matter: A tone’s identity depends on its position in a pitch range, a range that can change from speaker to speaker or even within the utterances of a single speaker (cf. Wong & Diehl, 2003, for relevant empirical data). If this view is correct, a listener must have some way of getting a relatively rapid and accurate estimate of where a given tone falls in the pitch range of a speaker. Interestingly, it may be possible for a listener to make such judgments even on isolated syllables, due to voice-quality factors that vary depending on location in the speaker’s pitch range (Honorof & Whalen, 2005).
与语言语气是相对问题的观点相反,Deutsch 等人。(2004) 阐明了语言声调的绝对频率可以成为其作为声音类别的身份的一部分。为了支持这个想法,他们报告了一项研究,其中要求说越南语和普通话的人在不同的两天内用他们的母语阅读单词列表。这些词使用了每种语言的不同语调。对于每个说话者,每个单词的平均音调在第 1 天和第 2 天进行了量化。主要发现是两个录音中给定单词的音调在说话者内部高度一致。对于大多数演讲者来说,第 1 天和第 2 天单词的音调差异非常小:1/2 半音或更小。相比之下,说英语的人在同一任务上表现出的音调一致性较差。
In contrast to the view that linguistic tone is a relative matter, Deutsch et al. (2004) have articulated the idea that a linguistic tone’s absolute frequency can be part of its identity as a sound category. In support of this idea, they report a study in which speakers of Vietnamese and Mandarin were asked to read a word list in their native language on 2 separate days. The words used the different linguistic tones of each language. For each speaker, the mean pitch of each word was quantified on Day 1 and Day 2. The main finding was a high degree of within-speaker consistency in the pitch of a given word across the two recordings. For most speakers, the difference in the pitch of a word on Day 1 and Day 2 was quite small: a 1/2 semitone or less. English speakers, in contrast, showed less pitch consistency on the same task.
基于这些结果,Deutsch 等人。建议使用音调语言的人有一个精确稳定的绝对音高模板,用于语音目的。他们认为,这种模板是作为早期发育的正常部分而获得的,就像婴儿获得其母语音韵学的其他方面一样。为了支持这一观点,他们指出音乐感知研究表明婴儿在学习非语言音调序列的结构时可以使用绝对音高提示(例如,Saffran & Griepentrog,2001;Saffran,2003;Saffran 等,2005;但请参阅 Trainor,2005 年的评论)。德语等。认为绝对音高起源于语音的一个特征,第 7 章,第7.3.4和7.3.5节)。
Based on these results, Deutsch et al. suggest that tone-language speakers have a precise and stable absolute pitch template that they use for speech purposes. They argue that such a template is acquired as a normal part of early development, in the same way that infants acquire other aspects of their native phonology. In support of this notion, they point to research in music perception suggesting that infants can use absolute pitch cues when learning about the structure of nonlinguistic tone sequences (e.g., Saffran & Griepentrog, 2001; Saffran, 2003; Saffran et al., 2005; but see Trainor, 2005, for a critique). Deutsch et al. argue that absolute pitch originated as a feature of speech, with musical absolute pitch—the rare ability to classify pitches into well-defined musical categories without using any reference note—piggybacking on this ability (musical absolute pitch is discussed in more detail in Chapter 7, in sections 7.3.4 and 7.3.5).
正如人们所预料的那样,Deutsch 等人的想法。(2004) 被证明是相当有争议的。作为对 Deutsch 等人想法的回应,语音科学家正在研究的一个问题是,声调语言和非声调语言说话者在不同日子阅读单词的音调一致性方面确实存在差异。伯纳姆等人。(2004) 在一项使用普通话、越南语和英语的人的研究中解决了这个问题。他们研究的一个新颖方面是,他们测试了阅读列表中的固定词序(如 Deutsch 等人所使用的)是否会影响结果。他们发现,即使从第 1 天和第 2 天开始改变单词顺序,说普通话和越南语的人表现出比说英语的人更高的词内音高一致性。但是,差异很小:音调语言使用者约为 3/4 半音,而英语使用者约为 1 个半音。因此,在这方面,使用声调语言和非声调语言的人之间似乎没有太大区别。
As one might expect, the ideas of Deutsch et al. (2004) are proving quite controversial. In response to Deutsch et al.’s ideas, one issue being examined by speech scientists is the claim that tone-language and nontone-language speakers are really different in terms of the pitch consistency with which they read words on separate days. Burnham et al. (2004) addressed this issue in a study that used Mandarin, Vietnamese, and English speakers. A novel aspect of their study was that they tested whether a fixed word order in the reading list (as used by Deutsch et al.) might influence the results. They found that even when the order of words was changed from Day 1 and Day 2, Mandarin and Vietnamese speakers showed higher intraword pitch consistency than English speakers. However, the difference was small: around 3/4 semitone for tone language speakers versus about 1 semitone for English speakers. Thus there does not appear to be a huge difference between tone-language and nontone-language speakers in this regard.
很明显,关于语音绝对音高的争论还没有结束。目前,绝对音高在语音中起作用的想法最好被认为是暂时的。在绝对音高方面,语言和音乐之间更有说服力的联系是 Deutsch 及其同事 (2006) 的发现,母语为音调语言的音乐家比不说音调语言的音乐家更有可能拥有音乐绝对音高语气。研究人员对中央音乐学院的大量音乐家进行了测试分别在北京和纽约伊士曼音乐学院获得音乐博士学位。参与者接受了音乐 AP 测试,其中涉及命名跨越 3 个八度音程的孤立音调的音高等级。在这个测试中,中国和美国组之间的差异是巨大的。例如,对于在 4 至 5 岁之间开始接受音乐训练的学生,大约 60% 的中国音乐家表现出 AP,而英语母语者的这一比例为 14%。这一发现提出了一个有趣的问题,即开发一个框架来对语音中的音高进行分类解释是否有助于获得音乐音高类别。19
It is clear that the debate over absolute pitch in speech is not over. For the moment, the idea that absolute pitch plays a role in speech is best regarded as tentative. A more compelling link between language and music with regard to absolute pitch is the finding by Deutsch and colleagues (2006) that musicians who are native speakers of a tone language are substantially more likely to have musical absolute pitch than are musicians who do not speak a tone language. The researchers tested a large number of musicians in the Central Conservatory of Music in Beijing and in the Eastman School of Music in New York, respectively. Participants were given a musical AP test involving naming the pitch class of isolated tones spanning a 3-octave range. Differences between the Chinese and American groups on this test were dramatic. For example, for students who had begun musical training between ages 4 and 5, approximately 60% of Chinese musicians showed AP, compared to 14% of the English speakers. This finding raises the interesting question of whether developing a framework for the categorical interpretation of pitch in speech facilitates the acquisition of musical pitch categories.19
目前对语言中音高对比的审查表明,语音和音乐中的音高关系根本上是不相称的,最显着的区别是语音中缺乏稳定的音程。令人惊讶的是,在某些情况下,使用固定音高的乐器来传达语言信息。这种“语音代理”的一个著名例子是西非的会说话的鼓(Herzog,1945 年;Locke & Agbeli,1980 年;Cloarec-Heiss,1999 年)。会说话的鼓通过模仿音调语言中话语的音调和音节节奏来传达信息。一些会说话的鼓允许演奏者通过按下连接到鼓面的皮带来调整鼓膜的张力(以及它的音高):这些鼓可以模仿语音中的滑音。其他鼓没有膜,完全由木头制成:这些鼓产生反映乐器共振特性的固定音高。木制乐器可以达到相当大的尺寸,并且能够远距离传递信息,最远可达数英里。例如,Carrington (1949a, 1949b) 描述了刚果上游的 Lokele 人使用的圆柱形原木。原木通过一个简单的矩形缝隙(图 2.15a)。
The current review of pitch contrasts in language suggests that pitch relations in speech and music are fundamentally incommensurate, with the most salient difference being the lack of stable intervals in speech. It may come as a surprise, then, that there are cases in which musical instruments with fixed pitches are used to convey linguistic messages. One famous example of such a “speech surrogate” is the talking drum of West Africa (Herzog, 1945; Locke & Agbeli, 1980; Cloarec-Heiss, 1999). A talking drum communicates messages by mimicking the tones and syllabic rhythms of utterances in a tone language. Some talking drums allow the player to adjust the tension of the drum membrane (and thus its pitch) by pressing on thongs connected to the drumheads: These drums can mimic gliding tones in speech. Other drums have no membranes and are made entirely of wood: These drums produce fixed pitches that reflect the resonance properties of the instrument. Wooden instruments can reach a considerable size and are capable of conveying messages over long distances, up to several miles. For example, Carrington (1949a, 1949b) described a cylindrical log used by the Lokele of the upper Congo. The log is hollowed out via a simple rectangular slit (Figure 2.15a).
挖空是不对称的,因此在鼓的一个唇缘下有一个比另一个更深的口袋(图 2.15b)。鼓会产生两种音调:当敲击凹陷一侧的唇部时会发出较低的音调,而当敲击另一侧时会发出较高的音调。这两种声调旨在模仿 Lokele 口语的两种语言声调。卡林顿指出,一个这样的鼓在音调之间的音程大约为小三分之一(3 个半音),但其他鼓有其他音程,并且没有固定的调音方案。这表明语言声调之间的关系不是根据标准化间隔来构思(或感知)的。20
The hollowing is asymmetrical, so that there is a deeper pocket under one lip of the drum than the other (Figure 2.15b). The drum produces two tones: a lower tone when the lip over the hollower side is struck, and a higher pitched tone when the other side is struck. These two tones are meant to mimic the two linguistic tones of spoken Lokele. Carrington noted that one such drum had an interval of approximately a minor third between the tones (3 semitones), but that other drums had other intervals, and there was no fixed tuning scheme. This suggests that relations between linguistic tones are not conceived (or perceived) in terms of standardized intervals.20
图 2.15非洲中部刚果上部 Lokele 人的会说话的鼓。(A) 中显示了滚筒的侧视图。这两种音调是通过敲击中央缝隙两侧的鼓唇产生的。(B) 中显示了鼓的横截面,揭示了鼓的两个唇缘下的不对称空心程度。来自卡林顿,1949b。
Figure 2.15 A talking drum of the Lokele people of the upper Congo in central Africa. A side view of the drum is shown in (A). The two tones are produced by striking the drum lips on either side of the central slit. A cross section of the drum is shown in (B), revealing the asymmetric degree of hollowing under the two lips of the drum. From Carrington, 1949b.
基于音高的语音代理的另一个示例是吹口哨语音(Cowan,1948 年;Stern,1957 年;Sebeok 和 Umiker-Sebeok,1976 年;Thierry,2002 年;Rialland,2005 年)。在声调语言中,口哨的音高充当语言声调(“口哨声”)的替代品。21 Foris (2000) 对这种交流系统的功效进行了描述,他是一位语言学家,多年来一直在研究墨西哥声调语言 Chinantec。中电吹口哨语音使用语调和重音区别的组合来以最小的歧义来传达信息。福里斯写道:
Another example of a pitch-based speech surrogate is whistled speech (Cowan, 1948; Stern, 1957; Sebeok & Umiker-Sebeok, 1976; Thierry, 2002, Rialland, 2005). In tone languages, the pitch of the whistle acts as a surrogate for linguistic tones (“tonal whistled speech”).21 A description of the efficacy of this communication system is provided by Foris (2000), a linguist who spent many years studying the Mexican tone language Chinantec. Chinantec whistled speech uses a combination of tone and stress distinctions to communicate messages with minimal ambiguity. Foris writes:
事实上,任何可以用言语表达的东西都可以通过吹口哨来传达。我为我解释过的最复杂的例子是在补给飞机即将抵达的情况下。由于大雨,我检查了简易机场的泥土是否受到侵蚀,发现它需要大修。我去找镇长,说明需要立即修理,当时这是镇警察的职责。小镇坐落在马蹄形的山坡上;他的房子位于“手臂”的一端,市政厅位于中心,距离大约 1/2 公里。他把手指放在嘴里,吹口哨以引起他们的注意。他们回答说他们在听,他吹了一条长长的口哨。我要求解释,他给出的解释如下:“飞机很快就到了。飞机跑道需要修理。拿上镐、铲子和手推车,马上修好它。” (第 30-31 页)
Virtually anything that can be expressed by speech can be communicated by whistling. The most complex example that I had interpreted for me was on the occasion that the supply plane was due to come in. Because of heavy rains, I checked the dirt airstrip for erosion and saw that it needed extensive repairs. I went to the town president and explained the need for immediate repairs, a job that was the responsibility of the town police in those days. The town is set in a horseshoe-shaped hillside; his house is at one end of the “arm,” with the town hall at the centre, about 1/2 a kilometre away. He put his fingers in his mouth and whistled to get their attention. They responded that they were listening, and he whistled a long message. I asked for the interpretation, which he gave as the following: “The plane will be here soon. The airstrip needs to be repaired. Get the picks, shovels, and wheelbarrows and fix it right away.” (pp. 30-31)
如果得到证实,这确实是一个了不起的发现。除了这个例子之外,还有关于其他文化中使用口哨来交换更简单和更刻板的信息的口哨语音的有据可查的记录。就目前的目的而言,相关问题是这些系统是否使用固定音高间隔。不幸的是,没有很好的经验数据来解决这个问题。由于音调口哨语音通常是由嘴巴而不是具有固定音高的乐器产生的,因此似乎不太可能存在依赖于标准化音高间隔的调音方案。
If verified, this is indeed a remarkable finding. Quite apart from this example, there are well-documented accounts of tonal whistled speech from other cultures in which whistling is used to exchange simpler and more stereotyped messages. For the current purpose, the relevant question is whether these systems use fixed pitch intervals. Unfortunately, there are no good empirical data to address this issue. Because tonal whistled speech is usually produced by mouth rather than by an instrument with fixed pitches, it seems unlikely that there is a tuning scheme that relies on standardized pitch intervals.
也许最接近使用固定调谐方案的语音代理是埃塞俄比亚西南部的“krar 语音”。krar 是一种类似于吉他的 5 弦乐器,用于模仿 Benčnon 语言的 5 个离散音调。Wedekind (1983) 描述了一个隐藏物体的游戏,然后要求寻找者找到它。寻求者收到的唯一线索来自 krar 演奏者,他们使用乐器模仿语言声调“告诉”寻找者隐藏物体的位置。Krar 语音值得研究,因为音调之间的间隔是固定的,至少对于给定的 krar。特定间隔对于语言信息的交流有多重要?人们可以通过研究可以在多大程度上改变给定 krar 的精确调音并仍然以给定的功效传达信息来测试这一点。如果标准化的音高间隔对于语言声调并不重要,那么人们会预测,在交流效率下降到预定阈值以下之前,可以容忍大量的失调。
Perhaps the closest that a speech surrogate comes to using a fixed tuning scheme is the “krar speech” of southwest Ethiopia. The krar is a 5-string instrument resembling a guitar, which is used to imitate the 5 discrete level tones of the language Benčnon. Wedekind (1983) describes a game in which an object is hidden, and then the seeker is asked to find it. The only clues the seeker receives come from the krar player, who “tells” the seeker where the object is hidden using the musical instrument to imitate the linguistic tones of utterances. Krar speech merits study, because the intervals between tones are fixed, at least for a given krar. How important are the particular intervals to the communication of linguistic messages? One could test this by studying how much one could alter the exact tuning of a given krar and still communicate messages with a given efficacy. If standardized pitch intervals are not important for linguistic tones, one would predict that a good deal of mistuning would be tolerable before communication efficacy dropped below a predetermined threshold.
虽然音高对比在语言中可以组织得很好,正如离散声级语言所证明的那样,但毫无疑问,语言中组织声音对比的主要维度是音色。这可以通过思想实验来证明。想象一下,让说一种语言的人听计算机合成的独白,其中所有句子都以单调呈现。对于说英语等非声调语言的人来说,如果是人工的,这样的语音仍然很容易理解。对于声调语言的使用者来说,可能会出现可懂度的损失,但可懂度不太可能降为零,尤其是在普通话等语言中,音节的声调可能与其他线索(如持续时间和振幅;惠伦和徐,1992)。22现在进行相反的实验:允许合成句子的音高正常变化,但用一种音色替换所有音素,比如元音/a/。所有语言的可懂度都将降低为零,无论它们是否是声调语言(也许在极少数情况下除外,例如 Chinantec,据称声调和重音的结合产生了 30 多种可以传达的区别含糊不清;参见上一节)。
Although pitch contrasts can be quite organized in language, as demonstrated by discrete level tone languages, there is no question that the primary dimension for organized sound contrasts in language is timbre. This can be shown by a thought experiment. Imagine asking a speaker of a language to listen to computer-synthesized monologue in which all sentences are rendered on a monotone. For speakers of nontone languages such as English, such speech would still be highly intelligible, if artificial. For speakers of tone languages, there may be a loss of intelligibility, but it is unlikely that intelligibility would drop to zero, especially in languages such as Mandarin, in which a syllable’s tone may be correlated with other cues (such as duration and amplitude; Whalen & Xu, 1992).22 Now conduct the converse experiment: Allow the pitch of the synthesized sentences to vary normally but replace all phonemes with one timbre, say the vowel /a/. Intelligibility would be reduced to zero for all languages, irrespective of whether they were tone languages or not (except perhaps in rare cases such as Chinantec, in which it is claimed that the combination of tone and stress results in over 30 distinctions that can be communicated with minimal ambiguity; cf. the previous section).
因此,语音基本上是一个有组织的音色对比系统。(有人可能会争辩说,持续模式也是语音的基础,但如果没有音色对比,就没有定义不同音素或音节的基础,因此也没有进行持续对比的基础。)人声是音色对比的最高工具。一项语言调查表明,人类的声音能够产生对应于约 800 个不同音素的音色,这仅代表现存语言中已知的音素 (Maddieson, 1984)。当然,没有一个说话者或语言使用如此多的对比:音素清单的大小从 11 个(在巴布亚新几内亚的一种语言 Rotokas 中有 5 个元音和 6 个辅音)到 156 个(在科伊桑语 !Xóõ 中有 28 个元音和 128 个辅音)来自南非的语言),
Speech is thus fundamentally a system of organized timbral contrasts. (One might argue that durational patterning is also fundamental to speech, but without timbral contrasts there would be no basis for defining distinct phonemes or syllables, and hence no basis for making durational contrasts.) The human voice is the supreme instrument of timbral contrast. A survey of languages reveals that the human voice is capable of producing timbres corresponding to ~800 distinct phonemes, and this represents only phonemes known from extant languages (Maddieson, 1984). Of course, no single speaker or language uses this many contrasts: Phoneme inventories range in size from 11 (5 vowels and 6 consonants in Rotokas, a language of Papua New Guinea) to 156 (28 vowels and 128 consonants in !Xóõ, a Khoisan language from South Africa), with the average inventory size being 27 phonemes (Trail, 1994; Maddieson, 1999).
如第 2.2.4 节所述,基于有组织的音色对比序列的音乐系统很少见。在比较音乐和语言方面,为什么要深入研究语音中的音色组织?这样做有两个主要原因。首先,音色是语言声音类别的主要依据。因为下一节的重点(第 2.4 节) 在比较语音和音乐中的声音分类机制时,理解语音类别的物理基础是必不可少的。第二个原因是,对语言音色的理解为检验一种文化的音乐音色与语言音色之间的关系提供了基础。由于这些原因,本节的其余部分(2.3.3)简要概述了语言中的音色对比。已经了解语音声学的读者可能希望向前跳到下面的“将语言音色对比映射到音乐声音”小节。
As noted in section 2.2.4, musical systems based on sequences of organized timbral contrasts are rare. In terms of comparing music and language, why then delve into the organization of timbre in speech? There are two principal reasons for doing so. First, timbre is the primary basis for linguistic sound categories. Because the focus of the next section (section 2.4) is on comparing sound categorization mechanisms in speech and music, understanding the physical basis of speech sound categories is essential. The second reason is an understanding of timbre in language provides the basis for examining the relationship between the musical and linguistic timbres of a culture. For these reasons, the remainder of this section (2.3.3) provides a brief overview of timbral contrasts in language. Readers already knowledgeable about speech acoustics may wish to skip ahead to the subsection “Mapping Linguistic Timbral Contrasts Onto Musical Sounds” below.
当声音从各种来源产生时,声道形状的连续变化导致语音的音色对比。考虑英文单词“sleepy”,它用 IPA 符号写成 /slipi/。/s/ 是通过将舌头塑造成一个凹槽来产生的,这样一股气流就被强制穿过牙齿,导致产生嘶嘶声(摩擦音)的声学湍流。另一方面,/l/ 和 /i/ 依赖于声带振动的和声丰富的声音(通过用手指轻轻触摸“亚当的苹果”,可以感觉到声带振动的开始与从/s/ 到 /l/)。这个和声丰富的频谱是通过舌头在声道中的不同位置以不同的方式为 /l/ 和 /i/ 雕刻的。具体来说,在 /l/ 期间,舌尖与牙槽嵴(就在牙齿后面)接触,在舌头上形成一个气囊。这个气穴的作用是衰减特定频率区域中声源的频谱 (Johnson, 1997:155)。在 /l/ 释放后,舌体在口腔中移动到一个高且向前的位置,在这里它以这样一种方式塑造语音频谱,从而产生 /i/ 声音(参见下一节, 在元音上)。在 /p/ 期间,当嘴巴闭合时声带振动停止,压力增加以准备爆发空气。爆发后不久,在 /l/ 释放后,舌体在口腔中移动到一个高且向前的位置,在这里它以这样一种方式塑造语音频谱,从而产生 /i/ 声音(参见下一节, 在元音上)。在 /p/ 期间,当嘴巴闭合时声带振动停止,压力增加以准备爆发空气。爆发后不久,在 /l/ 释放后,舌体在口腔中移动到一个高且向前的位置,在这里它以这样一种方式塑造语音频谱,从而产生 /i/ 声音(参见下一节, 在元音上)。在 /p/ 期间,当嘴巴闭合时声带振动停止,压力增加以准备爆发空气。爆发后不久,23当嘴巴张开时,声带再次开始振动,当舌头移动到最后的 /i/ 位置时完成 /p/ 声音。
The timbral contrasts of speech result from continuous changes in the shape of the vocal tract as sound is produced from a variety of sources. Consider the English word “sleepy,” which is written in IPA symbols as /slipi/. The /s/ is produced by shaping the tongue into a groove so that a jet of air is forced across the teeth, resulting in acoustic turbulence that produces a hissing sound (a fricative). The /l/ and /i/, on the other hand, rely on the harmonically rich sound of vocal fold vibration (by lightly touching the fingers to the “Adam’s apple” one can feel the onset of vocal fold vibration associated with the transition from the /s/ to the /l/). This harmonically rich spectrum is sculpted in different ways for the /l/ and the /i/ by differing positions of the tongue in the vocal tract. Specifically, during the /l/, the tip of the tongue makes contact with the alveolar ridge (just behind the teeth), forming a pocket of air over the tongue. This air pocket acts to attenuate the spectrum of the sound source in a particular frequency region (Johnson, 1997:155). After the release of the /l/, the tongue body moves to a position high and forward in the mouth, where it serves to shape the voice spectrum in such a way such that an /i/ sound is produced (cf. the following section, on vowels). During the /p/, vocal fold vibration stops as the mouth closes and pressure builds up in preparation for a burst of air. Soon after the burst,23 the vocal folds begin vibrating again as the mouth opens, completing the /p/ sound as the tongue moves into position for the final /i/.
这个例子说明了语音的两个一般属性。首先,音色对比的连续性非常快:说出“sleepy”这个词大约需要 500 毫秒,平均每 100 毫秒产生 1 个音素,或 10 个音素/秒。在连接语音中,每秒 10 个音素的速率一点也不奇怪。上述示例说明的第二个语音属性是辅音和元音的粗略交替。这确保了音色对比的快速连续,因为辅音和元音往往具有非常不同的发音,因此具有不同的音色。发生这种情况是因为辅音通常是通过声道变窄或闭合产生的,而元音则与从肺部通过声道的气流畅通无阻有关。说话时声道的 X 光片显示了这一点作为讲话过程中移动发音器的非凡编排的一部分,一连串的收窄和开口。24
This example illustrates two general properties of speech. First, the succession of timbral contrasts is extremely rapid: It takes about 500 milliseconds to utter the word “sleepy,” yielding an average of 1 phoneme every 100 ms, or 10 phonemes/sec. In connected speech, a rate of 10 phonemes per second is not at all unusual. The second property of speech illustrated by the above example is the rough alternation of consonants and vowels. This ensures a rapid succession of timbral contrasts, as consonants and vowels tend to have very different articulations, and consequently, distinct timbres. This occurs because consonants are typically produced via a narrowing or closure of the vocal tract, whereas vowels are associated with an unimpeded flow of air from the lungs through the vocal tract. X-ray movies of the vocal tract during speech show this succession of narrowings and openings as part of a remarkable choreography of moving articulators during speech.24
一种语言的音色是如何相互组织的?如上例所示,语音中的音色对比与产生语音的发音密切相关。事实上,最广泛使用的现代语音分类法 (IPA) 是基于发音特征的。例如,辅音是根据它们的发音方式和它们的主要收缩位置来分类的。方式是指发声时发音器的收缩类型,而位置是指这种收缩在声道中的位置。一些英语辅音按方式和位置的组织如表 2.3所示。该表使用 IPA 符号,25图 2.16显示了世界语言中使用的一些发音位置的示意图。
How are the timbres of a language organized with respect to each other? As the above example suggests, timbral contrasts in speech are intimately linked to the articulations that give rise to speech sounds. In fact, the most widely used modern taxonomy of speech sounds (the IPA) is based on articulatory features. For example, consonants are classified in terms of their manner of articulation and their place of their primary constriction. Manner refers to the kind of constriction made by the articulators when producing the consonant, and place refers to the location of this constriction in the vocal tract. The organization of some English consonants by manner and place is shown in Table 2.3. The table uses IPA symbols,25 and a schematic diagram showing some places of articulation used in the world’s languages is shown in Figure 2.16.
在表 2.3中,成对出现的辅音通过浊音区分(例如,/f/ 和 /v/ 都是摩擦音,摩擦音的产生方式完全相同,但声带在 /v/ 期间振动).
In Table 2.3, consonants that appear in pairs are distinguished by voicing (for example, /f/ and /v/ are both fricatives in which the frication noise is produced in exactly the same way, but the vocal folds vibrate during a /v/).
语音中音色对比的程度和这种对比发生的速度都远远超过非声乐乐器产生的任何东西。然而,语音中音色对比的组织与音乐并非无关。事实上,它可能有助于解释为什么某些具有有组织的音色对比的音乐传统是成功的。一种这样的传统是北印度塔布拉击鼓,在前面的第 2.2.4 节中有所描述(“基于音色的音乐系统示例”小节)。如该部分所述,在塔布拉音乐中,对比音色是根据独特的位置和发音方式组织的。这些地方是鼓面被敲击的区域,方式是敲击鼓面的方式,而发音器是进行敲击的不同手指。这个例子说明了如何从类似语音的组织系统中构建成功的“音色音乐”。另一个强调音色对比的悠久音乐传统是西藏密宗唱诵。在这种音乐中,音色对比慢慢展开,而音高则保持类似嗡嗡声的模式(Cogan,1984:28-35)。
Both the degree of timbral contrast in speech and the rate at which such contrast occurs are far in excess of anything produced by a nonvocal musical instrument. However, the organization of timbral contrasts in speech is not irrelevant to music. In fact, it may help explain why certain musical traditions with organized timbral contrasts are successful. One such tradition is North Indian tabla drumming, described earlier in section 2.2.4 (subsection “Example of a Timbre-Based Musical System”). As noted in that section, in tabla music the contrasting timbres are organized in terms of distinctive places and manners of articulation. The places are regions of the drum heads that are struck, the manners are the ways in which the drum heads are struck, and the articulators are the different fingers that do the striking. This example illustrates how a successful “timbre music” can be built from a speech-like system of organization. Another long-lived musical tradition which emphasizes timbral contrast is Tibetan tantric chanting. In this music, timbral contrasts slowly unfold while pitch maintains a drone-like pattern (Cogan, 1984:28-35). A key to the aesthetic success of this tradition may be that it takes an aspect of human experience that is normally experienced extremely rapidly (timbral speech contrast) and slows it down to a completely different timescale, thus giving the opportunity to experience something familiar a new way.
图 2.16发音位置在声道的横截面视图中。编号线表示 17 种命名的关节姿势中的一些(例如,2 是下唇接触上牙的唇齿动作)。注意靠近声道前部的发音位置的集中。来自 Ladefoged & Madiesson,1996 年。
Figure 2.16 Places of articulation in a cross-sectional view of the vocal tract. Numbered lines indicate some of the 17 named articulatory gestures (e.g., 2 is a labiodental maneuver in which the lower lip touches the upper teeth). Note the concentration of places of articulation near the front of the vocal tract. From Ladefoged & Madiesson, 1996.
对于那些有兴趣比较语音和音乐的人来说,对元音产生和声学的基本了解是必不可少的,因为元音是最具音乐性的语音,具有清晰的音调和丰富的谐波结构。一种语言的元音彼此不同的主要方式是它们的音色,并且每种语言都有独特的声乐音色调色板。一种语言的元音数量从少数语言中的 3 个,包括 Arrernte(澳大利亚)、Pirahã(亚马逊雨林)和 Aleut(阿拉斯加),到日耳曼方言中的 24 个,例如 Weert 的荷兰方言( P. Ladefoged,个人交流)。语言的遗传多样性样本表明,人类语言中元音的模态数为 5(Maddieson,1999)。美式英语有 15 个元音,因此位于元音谱的最丰富端。26
A basic understanding of vowel production and acoustics is essential for those interested in comparing speech and music, because vowels are the most musical of speech sounds, having a clear pitch and a rich harmonic structure. The primary way that the vowels of a language differ from each is in their timbre, and each language has a distinctive palette of vocalic timbres. The number of vowels in a language ranges from 3 in a small number of languages, including Arrernte (Australia), Pirahã (Amazonian rain forest), and Aleut (Alaska), to 24 in Germanic dialects, such as the Dutch dialect of Weert (P. Ladefoged, personal communication). A genetically diverse sample of languages reveals that the modal number of vowels in a human language is 5 (Maddieson, 1999). American English has 15 vowels, and is thus on the rich end of the vowel spectrum.26
T$able 2.3 Manner Versus Place Classification for Some English Consonants
元音的产生有两个主要特点。第一个是发声,指的是当空气冲过它们时,紧张的声带会振动,从而产生嗡嗡声。该声源的频谱由基频(F0,对应于感知音调)和大量谐波组成。最强的谐波在低音中,通常是基波的前 5 或 6 倍。单独说出的元音几乎总是有一些 F0 移动,这是赋予元音“语音”品质的部分原因(Sundberg,1987)。
Vowel production has two main features. The first is voicing, referring to the vibration of the tensed vocal folds as air rushes past them, resulting in a buzzy sound source. This sound source has a frequency spectrum that consists of a fundamental frequency (F0, corresponding to the perceived pitch) and a large number of harmonics. The strongest harmonics are in the bass, typically being the first 5 or 6 multiples of the fundamental. Vowels spoken in isolation almost always have some F0 movement, and this is part of what gives a vowel its “speechy” quality (Sundberg, 1987).
元音的第二个关键特征是舌头在声道中的位置,这会导致声学共振,通过强调某些频段来过滤潜在的声源频谱(图 2.17)。
The second key feature for vowels is the tongue’s position in the vocal tract, which results in acoustic resonances that filter the underlying source spectrum via the emphasis of certain frequency bands (Figure 2.17).
这些共振(称为共振峰)的位置和尖锐度为元音提供了其特有的音色,这反过来又决定了它的语言特征。前两个共振峰(F1 和 F2)的位置是识别元音最重要的线索,任何语言的元音都可以通过将它们放在以第一和第二共振峰的中心频率为轴的图上来表示(图2.18)。27
The position and sharpness of these resonances, called formants, provide the vowel with its characteristic timbre, which in turn determines its linguistic identity. The position of the first two formants (F1 and F2) are the most important cue for vowel identity, and the vowels of any language can be represented by placing them on a graph whose axes are the center frequencies of the first and second formant (Figure 2.18).27
如图2.18这意味着,改变 F1 或 F2 可以改变元音的音色,并最终将它移动得足够远,从而改变语言特性。当然,元音音素之间的界限在哪里取决于所讨论的语言。请注意,即使在一种语言中,元音之间的声学界限也不明显。元音似乎被组织成声学区域,在区域中间有更好或更多的原型样本,而在区域边界有较差的样本(参见 Kuhl,1991)。当然,这些区域(以及最佳范例)的精确位置取决于所分析的语音。例如,/i/ 由于声道长度的差异(较短的声道产生较高的共振频率),成年男性产生的元音与成年女性或儿童产生的元音在 F1-F2 空间中占据的区域略有不同。听众在判断元音样本的好坏时将这些差异归一化(Nearey,1978 年;Most 等,2000 年)。
As Figure 2.18 implies, changing either F1 or F2 can change a vowel’s timbre and eventually move it far enough so that it changes linguistic identity. Of course, where the boundaries between vowel phonemes are depends on the language in question. Note that even within a language, the acoustic boundaries between vowels are not sharp. Vowels seem to be organized into acoustic regions with better or more prototypical exemplars in the middle of the regions and poorer exemplars toward the boundaries of regions (cf. Kuhl, 1991). Of course, the precise location of these regions (and hence of the best exemplars) depends on the voice being analyzed. For example, /i/ vowels produced by adult males occupy a somewhat different region in F1-F2 space from those produced by an adult females or children due to differences in vocal tract length (shorter vocal tracts produce higher resonant frequencies). Listeners normalize for these differences in judging how good an exemplar a vowel is (Nearey, 1978; Most et al., 2000).
因为发音结构和元音的声学之间有很强的关系,所以元音关系通常用发音术语来描述。因此,语言学家经常提到前元音、高元音、中元音等。这些术语指的是元音产生过程中舌体的位置。因此,发音元音空间有 4 个不同的角(前高、后高、前低和后低),对应于“基本”元音 i、u、a、a(如“beet”、“boot”、“father” ”和“糟糕”)。图 2.19显示了完整的 IPA 元音图表:舌体位置的二维网格,其中垂直维度是舌高,水平维度是舌背。(请注意,大多数语言只有 5 个元音:没有一种语言具有 IPA 图表中显示的所有元音;参见 Maddieson,1984。)
Because there is a strong relationship between articulatory configuration and the acoustics of a vowel, vowel relations are often described in articulatory terms. Thus linguists frequently speak of front vowels, high vowels, mid vowels, and so on. These terms refer to the position of the tongue body during vowel production. Articulatory vowel space thus has 4 distinct corners (high front, high back, low front, and low back), corresponding to the “cardinal” vowels i, u, a, a (as in “beet,” “boot,” “father,” and “awful”). Figure 2.19 shows the full IPA vowel chart: a two-dimensional grid of tongue body position, in which the vertical dimension is tongue height and the horizontal dimension is tongue backness. (Note that most languages have only 5 vowels: No language has all the vowels shown in the IPA chart; cf. Maddieson, 1984.)
元音系统和元音感知有许多有趣的特征(Liljencrants & Lindblom, 1972; Stevens, 1989, 1998; Ladefoged, 2001; Diehl et al., 2003),遗憾的是不能在这里讨论,因为我们的重点是与音乐进行比较。(元音感知的一个方面,即“感知磁铁效应”,将在后面的2.4.3 节中讨论.) 这里我简单提一下与比较问题相关的元音产生的另一个方面。这是一个事实,即元音的声学结构可以根据上下文变化很大。例如,如果测量一个单独产生的单词中说出的英语元音的共振峰频率(例如,从一个人在实验室中阅读单词列表),然后测量快速和非正式演讲中同一元音的共振峰,后一种情况下的共振峰可能会大大低于其实验室值(即数百赫兹)。在另一个极端,如果在针对婴儿的语音(一种非常清晰的语音形式)中说出同一个词,则共振峰值可能会大大超过其在清晰的成人语音中的值(Kuhl 等人,1997)。因此,正常语音的一部分包括元音低发音和高发音之间的变化,具体取决于语音上下文(Lindblom,1990)。声学变化的量可能远远超过音乐声音的上下文变化(例如,音高间隔大小的变化或一种乐器在乐曲过程中产生的给定音符的频谱的变化)。
Vowel systems and vowel perception have many interesting features (Liljencrants & Lindblom, 1972; Stevens, 1989, 1998; Ladefoged, 2001; Diehl et al., 2003), which unfortunately cannot be discussed here because our focus is on comparison with music. (One aspect of vowel perception, the “perceptual magnet effect,” is discussed later in section 2.4.3.) Here I simply mention one other aspect of vowel production relevant to comparative issues. This is the fact that the acoustic structure of a vowel can vary a great deal depending on context. For example, if one measures the formant frequencies of an English vowel spoken in a word produced in isolation (e.g., from an individual’s reading of a word list in a laboratory) and then measures the formants of that same vowel in rapid and informal speech, the formants in the latter case may undershoot their laboratory values by a substantial amount (i.e., by hundreds of Hz). At the other extreme, if the same word is spoken in the context of infant-directed speech, a form of very clear speech, the formant values may substantially overshoot their values in clear adult-directed speech (Kuhl et al., 1997). Thus part of normal speech includes variation between hypo- and hyperarticulation of vowels depending on speech context (Lindblom, 1990). The amount of acoustic variation is likely to far exceed the contextual variation of musical sounds (e.g., variation in the size of a pitch interval or in the spectrum of a given note produced by one instrument over the course of a piece).
图 2.17元音 /i/(在“beat”中)和 /æ/(在“bat”中)的声道配置和频谱示例分别显示在顶行和底行中。在频谱中,锯齿状的线条显示了声音的谐波,平滑的曲线显示了声道共振的数学估计。共振峰是共振曲线中的峰值。前两个共振峰用箭头表示。元音 /i/ 具有低 F1 和高 F2,而 /æ/ 具有高 F1 和低 F2。这些共振是由舌头在声道中的不同位置引起的。肯尼斯·史蒂文斯 (Kenneth Stevens) 提供的声道图。光谱由 Laura Dilley 提供。
Figure 2.17 Examples of vocal tract configurations and frequency spectra for the vowels /i/ (in “beat”) and /æ/ (in “bat”) are shown in the top and bottom row, respectively. In the frequency spectra, the jagged lines show the harmonics of the voice, and the smooth curves shows mathematical estimates of vocal tract resonances. The formants are the peaks in the resonance curves. The first two formant peaks are indicated by arrows. The vowel /i/ has a low F1 and high F2, and /æ/ has a high F1 and low F2. These resonances result from differing positions of the tongue in the vocal tract. Vocal tract drawings courtesy of Kenneth Stevens. Spectra courtesy of Laura Dilley.
图 2.18一系列说话者发出的 10 个英语元音的共振峰频率(x 轴 = F1,y轴 = F2)。椭圆形将元音的大部分标记包含在一个感知类别中。符号遵循 IPA 约定。改编自 Peterson & Barney,1952 年。
Figure 2.18 Formant frequencies of 10 English vowels as uttered by a range of speakers (x axis = F1, y axis = F2). The ovals enclose the majority of tokens of vowels in a single perceptual category. Symbols follow IPA conventions. Adapted from Peterson & Barney, 1952.
到目前为止,为了介绍基本概念,我一直有意对语音中的声谱、感知音色和音位特性之间的关系含糊其辞。但是,现在是更具体的时候了。关键是静态频谱的快照,例如上一节中元音的快照,不应与“音色”混淆,也不应将单个音色与“音素”混淆。
Up till this point, I have been purposefully vague about the relationship between acoustic spectrum, perceived timbre, and phonemic identity in speech, in order to introduce basic concepts. However, it is time to be more specific. The key point is that a snapshot of a static spectrum, such as those of the vowels in the previous section, should not be confused with “a timbre,” and a single timbre should not be confused with “a phoneme.”
不应将频谱与音色混淆的想法已经在第 2.2.4 节的音乐音色讨论中介绍过。例如,音符的频谱可能会改变(例如,作为播放它的响度的函数)但音色可能保持不变。此外,声音的特征音色可能依赖于动态变化在光谱中。对于元音,这可以通过双元音来说明,例如 /ai/(英语单词“I”中的元音),其中舌头从低中心移动到高前位置,相应地 F1 减少,F2 增加. 对于辅音,这通过 /b/ 和 /d/ 等塞音辅音来说明,其中频谱变化是创建其特征音色的重要部分。这样的辅音总是与元音一起产生,并且在声学上的特征是一段沉默期(对应于声道闭合),然后是突然的宽带能量爆发(随着声道收缩被释放),然后是共振峰过渡通常持续约 50 毫秒。
The idea that a spectrum should not be confused with a timbre has already been introduced in the discussion of musical timbre in section 2.2.4. For example, the spectrum of a musical note may change (e.g., as a function of the loudness with which it is played) but the timbre may remain the same. Furthermore, the characteristic timbre of a sound may rely on dynamic changes in the spectrum. For vowels, this is illustrated by diphthongs such as /ai/ (the vowel in the English word “I”), in which the tongue moves from a low central to a high front position, with a corresponding decrease in F1 and increase in F2. For consonants, this is illustrated by stop consonants such as /b/ and /d/, in which spectral change is a vital part of creating their characteristic timbre. Such consonants are always coproduced with a vowel, and are acoustically characterized by a period of silence (corresponding to the vocal tract closure), followed by a sudden broadband burst of energy (as the vocal tract constriction is released), followed by formant transitions that typically last about 50 ms. These transitions occur because the acoustic resonance properties of the vocal tract change rapidly as the mouth opens and the tongue moves toward its target position (Stevens, 1998).
图 2.19 IPA 元音图表。“前”、“后”、“闭”和“开”是指舌体在口腔中的位置(闭 = 高,开 = 低)。来自 Ladefoged,2006 年。要听到此图中的声音,请访问:http ://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course/chapter1/vowels.html
Figure 2.19 The IPA vowel chart. “Front,” “back,” “close,” and “open” refer to the position of the tongue body in the mouth (close = high, open = low). From Ladefoged, 2006. To hear the sounds in this figure, visit: http://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course/chapter1/vowels.html
第二点,音色不应该与音素混淆,可以很容易地通过合成不同版本的元音来证明,这些元音在 F1 和 F2 中略有不同,并且在音质上明显不同,但仍然被认为是相同的元音 (Kuhl, 1991)。
The second point, that a timbre should not be confused with a phoneme, can easily be demonstrated by synthesizing different versions of a vowel that differ slightly in F1 and F2 and that are noticeably different in sound quality, but nevertheless are still perceived as the same vowel (Kuhl, 1991).
鉴于这些要点,说语音是一个有组织的音色对比系统是什么意思?这句话的全部意思是音素有不同的音色,而且因为语音可以被分析为一系列的音素,它也可以被认为是一系列的音色。
Given these points, what does it mean to say that speech is an organized system of timbral contrasts? All that is intended by this statement is that phonemes have different timbres, and that because speech can be analyzed as a succession of phonemes, it can also be considered a succession of timbres.
由于声谱图是语音研究的基础,并且用于下一节中描述的口语和音乐音色的比较研究,因此这里对它们进行简要介绍。频谱图是在连续时间点拍摄的一系列频谱,科学家可以通过它查看信号频率结构的时间演变。在频谱图中,时间绘制在水平轴上,频率绘制在垂直轴上,频谱幅度使用灰度绘制,因此幅度越高越暗。图 2.20显示了说出元音 /i/ 和 /æ/ 的声音的频谱图。
Because spectrograms are fundamental in speech research and are used in a comparative study of spoken and musical timbre described in the next section, a brief introduction to them is given here. A spectrogram is a series of spectra taken at successive points in time that allows a scientist to view the temporal evolution of the frequency structure of a signal. In a spectrogram, time is plotted on the horizontal axis, frequency on the vertical axis, and spectral amplitude is plotted using a gray scale, so that higher amplitudes are darker. Spectrograms of a voice saying the vowels /i/ and /æ/ are shown in Figure 2.20.
与上图 2.17中的静态频谱相比,一个显着差异是显而易见的:不再能看到单个谐波。相反,人们只会看到与共振峰对应的宽能带,以及与声带振动基本周期对应的细垂直条纹(F0 是该周期的倒数)。这是因为这些频谱图是用粗略的频率分辨率制作的,以便获得良好的时间分辨率:在频谱分析中这两种类型的分辨率之间有一个基本的权衡(Rosen & Howell,1991)。
When compared to the static spectra in Figure 2.17 above, one salient difference is obvious: The individual harmonics can no longer be seen. Instead, one simply sees broad energy bands that correspond to the formants, as well as thin vertical striations corresponding to the fundamental period of the vocal fold vibration (F0 is the inverse of this period). This is due to the fact that these spectrograms were made with a coarse frequency resolution, in order to obtain a good time resolution: There is a fundamental tradeoff between these two types of resolution in spectral analysis (Rosen & Howell, 1991).
当然,如果语音由一连串的静态元音组成,几乎没有共振峰运动,就不需要频谱图了。人们可以简单地绘制一系列静态光谱(如图 2.17 所示),并记录一个光谱何时更改为另一个光谱。然而,语音的一个决定性特征是频谱结构随时间迅速变化。图 2.21a显示了说话者说英式英语句子的声谱图,以及该句子的声学波形(参见声音示例 2.6)。
Of course, if speech consisted of a succession of static vowels with little formant movement, there would be no need for spectrograms. One could simply plot a succession of static spectra (as in Figure 2.17) with a record of when one changed to another. However, one of the defining features of speech is that spectral structure changes rapidly over time. Figure 2.21a shows a spectrogram of a speaker saying a sentence of British English, together with the acoustic waveform of the sentence (cf. Sound Example 2.6).
图 2.20一位男性说话者说出单词“beat”和“bat”的声谱图,以说明元音 /i/ 和 /as/。箭头指向元音的第一和第二共振峰。请注意 /i/ 中的低 F1 和高 F2,以及 /æ/ 中的反向模式。
Figure 2.20 Spectrograms of a male speaker saying the words “beat” and “bat,” to illustrate the vowels /i/ and /as/. Arrows point to the first and second formants of the vowels. Note the low F1 and high F2 in /i/, and the inverse pattern in /æ/.
图 2.21b中标记了该话语中所有元音的位置,图 2.21c中标记了词首。从这些图中可以明显看出关于语音的三个要点。首先,语音的共振峰(可见的黑色波浪线)几乎在不断运动。例如,请注意图 2.21b中元音 9 和 16 (分别是“opera”和“success”的第一个元音)的戏剧性共振峰移动,这是由于与以下辅音(/p/ 和 /k/,分别)。其次,元音不是唯一具有共振峰的音素(从图 2.21b 中也可以看出). 共振峰在许多语音的频谱中占有突出地位,因为每个声道配置都与塑造声源频谱的声学共振相关联。(这对于随后讨论说英语的人和说日语的人对 /l/ 和 /r/ 的感知很重要。)第三,如图 2.21c 所示,单词的边界不一定对应于明显的声学不连续性波形或频谱图,这就是为什么在听外语时我们可能不知道一个词在哪里结束,下一个词从哪里开始的原因之一。这种“分割问题”对婴儿来说尤其敏感,他们不能依靠已有的词汇来帮助他们从声音流中提取单词。正如我们将在第 3 章中看到的, 语音节奏被认为在帮助婴儿和成人从连接的语音中分割单词方面发挥作用 (Cutler, 1994)。
The position of all the vowels in this utterance are marked in Figure 2.21b, and word onsets in Figure 2.21c. Three important points about speech are obvious from these plots. First, the formants of speech, visible as the dark undulating lines, are in almost constant motion. For example, notice the dramatic formant movement in vowels 9 and 16 of Figure 2.21b (the first vowel of “opera” and of “success,” respectively), due to coarticulation with the following consonant (/p/ and /k/, respectively). Second, vowels are not the only phonemes with formants (also evident from Figure 2.21b). Formants figure prominently in the spectrum of many speech sounds, because every vocal tract configuration is associated with acoustic resonances that shape the spectrum of the sound source. (This will be important for a subsequent discussion of the perception of /l/ and /r/ by English vs. Japanese speakers.) Third, as seen in Figure 2.21c, the boundaries of words do not necessarily correspond to obvious acoustic discontinuities in the waveform or spectrogram, which is one reason why in listening to a foreign language we may have no idea where one word ends and the next begins. This “segmentation problem” is particularly keen for infants, who cannot rely on a preexisting vocabulary to help them extract words from the flow of sound. As we shall see in Chapter 3, speech rhythm is thought to play a role in helping infants and adults segment words from connected speech (Cutler, 1994).
图 2.21a一位说英语的英国女性说“The last concert given at the opera is a micronative success”这句话的声学波形和频谱图。
Figure 2.21a Acoustic waveform and spectrogram of a female British English speaker saying the sentence, “The last concert given at the opera was a tremendous success.”
图 2.21b与 2.21a中相同句子的声谱图,标记了 17 个连续元音中的每一个的起始和偏移。(请注意,“opera”发音为“opra”)。
Figure 2.21b Spectrogram of the same sentence as in 2.21a with the onset and offset of each of the 17 successive vowels marked. (Note that “opera” is pronounced “opra”).
所有人类文化都有两种截然不同的音色曲目:语言和音乐。在任何给定的文化中,人们都可以问:“这两种声音系统之间存在什么关系?” 解决这个问题的一种方法是检查以系统方式使用无意义音节或“vocables”来表示音乐声音的案例。词汇通常由音乐社区的成员共享,并用于音乐模式的口头教学。这种传统存在于世界许多地方,包括非洲、印度、日本和中国(Locke & Agbeli,1980;Kippen,1988;Hughes,2000;Li,2001)。(有关西方管弦乐演奏家在没有文本的情况下演唱旋律时非正式使用词汇的研究,请参见 Sundberg,1994。)一个熟悉的西方例子是 solfège,其中音阶的音符由音节表示do、re、mi等等。当然,在 Solfège 中,可发音和它们所代表的音乐声音之间没有系统的映射。更有趣的是,在语音和音乐声音之间似乎存在某种听觉和感知相似性的情况,尤其是当这种相似性存在于音色领域时。这种音色“声音象征主义”(Hinton et al., 1994)的案例让人们可以询问人类在音乐音色中发现哪些线索是显着的,以及他们如何使用语言机制来模仿这些线索。
All human cultures have two distinct repertoires of timbres: linguistic and musical. In any given culture one can ask, “What relationship exists between these two sound systems?” One approach to this question is to examine cases in which nonsense syllables or “vocables” are used in systematic ways to represent musical sounds. Vocables are typically shared by members of a musical community and are used for oral teaching of musical patterns. Such traditions are found in many parts of the world, including Africa, India, Japan, and China (Locke & Agbeli, 1980; Kippen, 1988; Hughes, 2000; Li, 2001). (For a study of the informal use of vocables by Western orchestral musicians when singing melodies without text, see Sundberg, 1994.) A familiar Western example is solfège, in which the notes of the scale are represented by the syllables do, re, mi, and so forth. Of course, in Solfège there is no systematic mapping between vocable sounds and the musical sounds they represent. More interesting are cases in which there seems to be some acoustic and perceptual resemblance between speech sounds and musical sounds, especially if this resemblance lies in the domain of timbre. Such cases of timbral “sound symbolism” (Hinton et al., 1994) allow one to ask what cues humans find salient in musical timbres, and how they go about imitating these cues using linguistic mechanisms.
y
图 2.21c与 2.21a中相同句子的声谱图,标记了词首。
Figure 2.21c Spectrogram of the same sentence as in 2.21a, with word onsets marked.
前面2.2.4 节中描述的北印度塔布拉(“基于音色的音乐系统示例”小节)是研究音色声音象征的主要候选对象。手鼓演奏者有一种直觉,即这种传统中的声音类似于鼓声。然而,人声和鼓以截然不同的方式发出声音,这提供了一个极好的机会来比较相应的语音和鼓声,并询问是什么让它们听起来相似,尽管产生机制截然不同。Patel 和 Iversen (2003) 进行了一项这样的研究,检查了六位专业手鼓演奏者说出和敲击的八个词汇及其相应的鼓声。(每个鼓声的例子可以在声音例子 2.3ah 中听到,相应的词汇可以在声音例子 2.7ah 中听到。关于每个声音如何击鼓的描述,请参考表 2.1 .)
The North Indian tabla, described earlier in section 2.2.4 (subsection “Example of a Timbre-Based Musical System”) is a prime candidate for a study of timbral sound symbolism. Players of tabla have an intuition that the vocables in this tradition resemble the drum sounds. However, a voice and a drum make sounds in very different ways, presenting an excellent opportunity to compare corresponding speech and drum sounds and ask what makes them sound similar despite radically different production mechanisms. Patel and Iversen (2003) conducted one such study, examining eight vocables and their corresponding drums sounds as spoken and drummed by six professional tabla players. (Examples of each drum sound can be heard in Sound Examples 2.3a-h, and the corresponding vocables can be heard in Sound Examples 2.7a-h. For a description of how the drum is struck for each of the sounds, refer back to Table 2.1.)
我们将 vocables 组织成四对,以关注相似 vocables 之间独特的音色对比。(例如,词汇“Tin”和“Tun”的区别仅在于每个音节中元音的身份。)对于每个词汇对,我们检查了词汇之间的声学差异,以及相应鼓声之间的声学差异。感兴趣的问题是鼓声之间的音色差异如何映射到语言提示上。我们为我们检查的每一对声乐和鼓声找到了一种不同类型的映射。表 2.4显示了其中一些映射。(定量数据可以在 Patel & Iversen,2003 年找到。)该表表明塔布拉鼓手对鼓声之间的各种音色对比敏感,并找到将这些映射到语音中的音色(或音高)对比的不同方法。
We organized the vocables into four pairs to focus on distinctive timbral contrasts between similar vocables. (For example, the vocables “Tin” and “Tun” differed only by the identity of the vowel in each syllable.) For each vocable pair, we examined acoustic differences between the vocables, as well as acoustic differences between the corresponding drum sounds. The question of interest was how timbral differences between drum sounds were mapped onto linguistic cues. We found a different kind of mapping for each pair of vocables and drum sounds we examined. Table 2.4 shows some of these mappings. (Quantitative data can be found in Patel & Iversen, 2003.) The table indicates that tabla drummers are sensitive to a variety of timbral contrasts between drum sounds, and find diverse ways of mapping these onto timbral (or pitch) contrasts in speech.
T$able 2.4北印度塔布拉鼓乐中鼓声与相应词汇之间的映射
T$able 2.4 Mapping Between Drum Sounds and Their Corresponding Vocables in North Indian Tabla Drumming
Patel 和 Iversen(2003 年)的研究还提供了一个示例,说明语言如何利用其特殊的语音库来捕捉音乐音色对比。这个例子涉及到可动词 /dha/。口语 /dha/ 使用 /d/ 的重送气形式,这是基于梵语的语言所特有的(Ladefoged & Maddieson,1996)。回想一下,/dha/ 代表高音鼓和低音鼓上的组合击球。为什么这种声音特别适合代表两个鼓的联合打击?答案在于鼓声和相应语音的精细声学细节。在鼓上,/dha/ 是两个笔划的组合:右鼓上的 /ta/ 笔划和左鼓上的 /ghe/ 笔划。图 2.22a中的频谱图表明这会创建一个复合声音,其中最低频率分量由左鼓的声音(低音巴扬)贡献,而高频分量由右鼓声音(高音达扬)贡献。
The study by Patel and Iversen (2003) also provided an example of how a language exploits its special phonetic inventory to capture a musical timbral contrast. This example concerns the vocable /dha/. Spoken /dha/ uses a heavily aspirated form of a /d/ that is unique to Sanskrit-based languages (Ladefoged & Maddieson, 1996). Recall that /dha/ represents a combination stroke on the higher and the lower pitched drum. Why might this sound be particularly apt for representing the combined striking of the two drums? The answer lies in the fine acoustic detail of the drum sound and the corresponding speech sound. On the drums, /dha/ is a combination of two strokes: a /ta/ stroke on the right drum and a /ghe/ stroke on the left. The spectrogram in Figure 2.22a shows that this creates a composite sound with the lowest frequency component contributed by the sound of the left drum (the lower pitched bayan) and higher frequency components contributed by the sound of the right drum (the higher pitched dayan).
至关重要的是,我们发现左鼓的频率呈现出两阶段结构:短暂的初始频率调制(图中标记为 FM,持续约 200 毫秒),随后是一段较长的稳定期。当我们测量最低频率与较高频率的能量比时,在 FM 期间与之后,我们发现该比率在 FM 部分明显更大。转向口语词汇 /dha/,我们还注意到该音节的声学结构中存在明显的两阶段模式。最初部分的重吸音(/h/ 音;持续约 90 毫秒)之后是具有稳定共振峰结构的元音 /a/(图 2.22b )). 早期的送气部分的特征是一种特殊类型的发声,称为送气(呼吸)发声,在此期间声带以不寻常的方式振动。由于大量气流经过它们,它们不会在每个循环中完全关闭,这对源光谱有显着影响 (Ladefoged, 2001)。特别是,语音的高频谐波中的能量非常少。因此,当我们测量基频能量与跨越前两个共振峰的频率范围内的能量比值时,我们发现该比值在吸气过程中明显更大。因此,鼓声和语音都呈现出两阶段的声学结构,初始阶段以低频能量为主。这种统治是通过完全不同的方式实现的,图 2.22c)。
Crucially, we found that the frequency of the left drum showed a two-stage structure: a short initial frequency modulation (labeled FM in the figure, lasting about 200 ms), followed by a longer period of stability. When we measured the ratio of energy in the lowest frequency versus the higher frequencies, during versus after the FM, we found that this ratio was significantly larger during the FM portion. Turning to the spoken vocable /dha/, we also noted a clear two-stage pattern in the acoustic structure of this syllable. An initial portion of heavy aspiration (the /h/ sound; lasting about 90 ms) was followed by the vowel /a/ with stable formant structure (Figure 2.22b). The early, aspirated portion is characterized by a particular type of voicing known as aspirated (breathy) voicing, during which the vocal folds are vibrating in an unusual mode. Because of the heavy air stream moving past them they do not fully close on each cycle, which has a dramatic influence on the source spectrum (Ladefoged, 2001). In particular, there is very little energy in higher frequency harmonics of the voice. Thus when we measured the ratio of energy in the fundamental frequency versus in a frequency range spanning the first two formants, we found that this ratio was significantly greater during aspiration. Thus both the drum sound and the speech sound showed a two-stage acoustic structure with the initial stage marked by a dominance of low-frequency energy. This dominance was achieved in completely different ways, but the perceptual end result was similar (Figure 2.22c).
图2.22a 打鼓 /dha/ 的波形图和频谱图,这是达扬和巴扬的组合。请注意巴杨声音中短暂的调频 (FM) 周期。
Figure 2.22a Waveform and spectrogram of drummed /dha/, a combination stroke on the dayan and bayan. Note the brief period of frequency modulation (FM) in the bayan sound.
图 2.22b口语 /dha/ 的波形和频谱图。注意音节开始(/d/ 的释放)和元音中稳定共振峰结构开始之间的重送气期。
Figure 2.22b Waveform and spectrogram of spoken /dha/. Note the period of heavy aspiration between the onset of the syllable (the release of the /d/) and the onset of stable formant structure in the vowel.
一个重要的问题是,是否需要熟悉特定传统的语言和音乐才能感知词汇的音色与其相关音乐声音之间的相似性。如果不是,那么这是强有力的证据,表明映射是基于真实的声学和感知相似性,而不仅仅是惯例。为了调查这个问题,我们向既不懂印地语也不懂塔布拉鼓的天真听众成对地展示了词汇及其相关的鼓声(例如,口语和鼓声 tin 和 tun),并要求他们尝试猜测正确的配对。我们发现听众对除 /tra/ 和 /kra/ 之外的所有配对表现都非常好(平均大约 80% 正确),其中两种鼓声之间的声学差异很难辨别。
An important question is whether familiarity with the language and music of a particular tradition is needed in order to perceive the similarity between the timbres of the vocables and their associated musical sounds. If not, then this is strong evidence that the mappings are based on real acoustic and perceptual resemblance and not just convention. To investigate this question, we presented vocables and their associated drum sounds in pairs (e.g., spoken and drummed tin and tun) to naive listeners who knew neither Hindi nor tabla drumming, and asked them to try to guess the correct pairing. We found that listeners performed remarkably well (about 80% correct, on average) for all pairings except /tra/ and /kra/, in which the acoustic difference between the two drum sounds is very hard to discern.
总之,对塔布拉鼓声音的实证研究证明了一种文化的音乐和语言音色之间的密切联系,并建议在其他文化中进行类似的实证研究,特别是在非洲和非洲侨民丰富的打击乐传统中。
In summary, an empirical study of tabla sounds demonstrates strong connections between the musical and linguistic timbres of a culture, and suggests that similar empirical studies should be conducted in other cultures, particularly in the rich percussive traditions of Africa and the African diaspora.
在描述了语音声学的一些基本方面之后,现在是时候从认知的角度将语音作为学习的声音类别进行检查了。大量证据表明,语音体验会产生影响语言声音感知的声音类别心理框架。与音乐一样,这样的框架是自适应的,因为给定语言声音的标记可以在其声学结构上有所不同。因此,该框架允许听众将声学上可变的标记转换为稳定的心理类别。以下两节提供了学习语音类别的证据。第一部分回顾了辅音感知研究中的行为证据,第二部分回顾了元音感知研究中的神经证据。
Having described some fundamental aspects of speech acoustics, it is now time to take a cognitive perspective and examine speech sounds as learned sound categories. Evidence abounds that experience with speech produces a mental framework of sound categories that influences the perception of linguistic sound. As with music, such a framework is adaptive because tokens of a given linguistic sound can vary in their acoustic structure. Thus the framework allows a listener to transform acoustically variable tokens into stable mental categories. The following two sections provide evidence for learned sound categories in speech. The first section reviews behavioral evidence from studies of consonant perception, and the second section reviews neural evidence from studies of vowel perception. I have chosen these particular studies because they invite parallel studies in music.
图 2.22c在鼓声/dha/ 和口语 /dha/ 的两个声学阶段测量低频 (LF) 与高频 (HF) 能量的比率(6 个说话者演奏/说出的 30 个标记的平均误差和标准误差示)。在这两种情况下,声音的早期部分都以低频能量为主,尽管机制完全不同。
Figure 2.22c Measurements of the ratio of low frequency (LF) to high frequency (HF) energy during the two acoustic stages of the drummed /dha/ and spoken /dha/ (mean and standard error across 30 tokens played/spoken by 6 speakers are shown). In both cases, the early portion of the sound is dominated by low frequency energy, though by completely different mechanisms.
人们早就知道,在一种语言中作为不同类别发挥作用的声音对于另一种语言的使用者来说很难区分。一个典型的例子是说日语的人对英语 /l/ 和 /r/ 的理解 (Goto, 1971)。在日语中,这两个声音并不作为不同的音素发挥作用,而是与单个类似 /r/ 的音素大致一致。
It has long been known that sounds that function as different categories in one language can be difficult for speakers of another language to discriminate. A classic example of this is the perception of English /l/ and /r/ by Japanese speakers (Goto, 1971). In Japanese, these two sounds do not function as different phonemes, but are broadly consistent with a single /r/-like phoneme.
艾弗森等人。(2003) 帮助阐明了说美国英语和说日语的人是如何听到这种语言对比的。他们创建了一个 /ra/ 和 /la/ 音节的“声学矩阵”,这些音节在声学结构上系统地不同(第二和第三共振峰频率,F2 和 F3;图 2.23a)。
Iverson et al. (2003) helped illuminate how American English versus Japanese speakers hear this linguistic contrast. They created an “acoustic matrix” of /ra/ and /la/ syllables that differed systematically in acoustic structure (second and third formant frequencies, F2 and F3; Figure 2.23a).
然后,他们要求说英语和日语的人用这些刺激执行三种不同的感知任务。在第一项任务中,听众简单地根据他们自己的母语音素标记每个刺激物,并评价该音素的范例有多好。在第二项任务中,听众成对地听到刺激(除了自配对之外的所有可能的配对)并评价他们感知到的相似性。在第三项任务中,听众再次成对听到刺激,其中一半相同,一半不同。(当不同时,它们仅在 F3 上有所不同;F2 保持恒定在 1003 赫兹。)任务只是说出一对成员是相同还是不同。
They then asked English and Japanese speakers to perform three different kinds of perceptual tasks with these stimuli. In the first task, listeners simply labeled each stimulus in terms of their own native-language phonemes, and rated how good an exemplar of that phoneme it was. In the second task, listeners heard the stimuli in pairs (every possible pairing except self-pairing) and rated their perceived similarity. In the third task, listeners once again heard stimuli in pairs, half of which were the same and half of which were different. (When different, they varied in F3 only; F2 was held constant at 1003 Hz.) The task was simply to say whether the members of a pair were the same or different.
标记任务表明,英语听众将此声学矩阵左半部分的刺激标记为 /ra/,右半部分标记为 /la/,而日语听众将矩阵中的所有刺激标记为 /ra/。28辨别任务显示,当两个标记跨越他们的两个不同音素类别时,英语听者的辨别力达到峰值,而日语听者在相应点没有表现出峰值。因此,英国和日本的听众以截然不同的方式听到了同样的身体刺激。这些感知差异的一种表现形式来自相似性评级任务。使用多维缩放 (MDS) 分析相似性分数,这是一种在二维空间中映射刺激的技术,这样在感知上相似的刺激靠得很近,反之亦然。为英语和日语听众生成的地图如图 2.23b所示. 每个维度上的刺激顺序与共振峰频率(垂直维度中的 F2 和水平维度中的 F3)相匹配,但两组听众表现出非常不同的声学空间感知扭曲。具体来说,说英语的人对 F3 差异最为敏感,这区分了 /l/ 和 /r/。相比之下,说日语的人对 F2 的变化更敏感。因此,这两种语言的声音类别以非常不同的方式扭曲了对一组共同的声学线索的感知。
The labeling task showed that English listeners labeled stimuli on the left half of this acoustic matrix as /ra/ and on the right half as /la/, whereas Japanese labeled all the stimuli in the matrix as /ra/.28 The discrimination task revealed that English listeners showed a peak in discrimination when the two tokens straddled their two different phoneme categories, whereas Japanese speakers showed no peak at the corresponding point. Thus English and Japanese listeners were hearing the same physical stimuli in rather different ways. One representation of these perceptual differences came from the similarity-rating task. The similarity scores were analyzed using multidimensional scaling (MDS), a technique that maps stimuli in a two-dimensional space such that stimuli that are perceptually similar lie close together, and vice versa. The resulting maps for English and Japanese listeners are shown in Figure 2.23b. The order of the stimuli on each dimension matched the formant frequencies (F2 in the vertical dimension and F3 in the horizontal dimension), but the two groups of listeners showed very different perceptual warpings of the acoustic space. Specifically, English speakers were most sensitive to F3 differences, which distinguished /l/ and /r/. In contrast, Japanese speakers were more sensitive to variation in F2. Thus the sound categories of the two languages had warped the perception of a common set of acoustic cues in very different ways.
图 2.23a Iverson 等人使用的刺激网格示意图。(2003) 探索美国和日本听众对“L”和“R”的看法。F2 和 F3 分别指代第二和第三语音共振峰。F2 和 F3 的频率在梅尔音阶(用于音调感知的心理声学音阶)上等距分布。
Figure 2.23a Schematic of stimulus grid used by Iverson et al. (2003) to explore the perception of “L” and “R” by American and Japanese listeners. F2 and F3 refer to the second and third speech formant, respectively. The frequencies of F2 and F3 are equally spaced on the mel scale (a psychoacoustic scale for pitch perception).
这种取决于母语的显着感知差异的存在自然提出了一个问题,即一个人的母语在发育的早期就开始影响声音感知。在一组经典研究中,Janet Werker 和她的同事调查了婴儿和成人听者对母语和非母语辅音对比的感知(Werker 等人,1981 年;Werker 和 Tees,1984 年,1999 年)。他们表明,在 6 个月大之前,加拿大婴儿可以区分某些语言与外语的对比,例如印地语中两种微妙不同形式 /t/ 之间的对比,即使加拿大成年人不能。然而,这种能力随着年龄的增长而迅速下降,并在 1 岁时消失。(相比之下,正如预期的那样,来自印地语家庭的婴儿在成长过程中保持了这种能力。
The existence of such dramatic perceptual differences depending on native language naturally raises the question of how early in development one’s native language begins to influence sound perception. In a classic set of studies, Janet Werker and her colleagues investigated the perception of native and nonnative consonant contrasts by infant and adult listeners (Werker et al., 1981; Werker & Tees, 1984, 1999). They showed that prior to 6 months of age, Canadian infants could discriminate certain speech contrasts from foreign languages, such as the contrast between two subtly different forms of /t/ in Hindi, even though Canadian adults could not. However, this ability declined rapidly with age and was gone by 1 year of age. (In contrast, infants from Hindi-speaking households maintained this ability as they grew, as would be expected.) This demonstrated that the language-specific framework for sound perception was being formed long before the infants were competent speakers of their native language (cf. Kuhl et al., 1992, for work on vowels).
这项工作启发了对其他非本地对比的感知的后续研究。一项重要发现是,如果涉及与母语中的任何内容都不相似的语音,成年人可以保留区分细微非母语对比的能力。具体来说,Best 等人。(1988) 发现美国婴儿和成人都可以区分来自祖鲁语的咔哒声,尽管他们之前没有这些语音的经验。(喀哒声是南部非洲语言中罕见的一种音素,是通过在嘴里产生吸力然后突然松开舌头而产生的。)诸如此类的发现引发了一个问题,即为什么敏感度会因某些非母语对比而下降,但不是其他人,并帮助激发了许多语音感知的发展语言模型(Kuhl,1993 年;Best,1994 年;沃克和科廷,2005 年)。例如,在 Best 的感知同化模型 (PAM) 中,成年人听到非母语对比的能力取决于对比音素与一个人的母语声音系统的关系(参见 Best 等人,2001 年)。如果两个音素都同样好地同化为一个母语类别,则辨别力很差(例如,日本人对英语 /l/ 与 /r/ 的感知)。如果他们同化为本地类别的好成员和差成员,则歧视更好,如果他们同化两个不同的类别,则歧视更好。当非母语音素无法同化为任何母语语音时,可以预测到最好的辨别力:这就是 Best 等人的情况。建议祖鲁点击研究。(正如将在实证比较研究部分讨论的那样,Best 和 Avery,1999,
This work inspired subsequent studies on the perception of other nonnative contrasts. One important finding was that adults can retain the ability to distinguish subtle nonnative contrasts if they involve speech sounds that do not resemble anything in the native language. Specifically, Best et al. (1988) found that both American infants and adults could discriminate between click sounds from Zulu, though they had no prior experience with these speech sounds. (Clicks are a rare type of phoneme found in Southern African languages, and are produced by creating suction in the mouth followed by an abrupt release of the tongue.) Findings such as these led to the question of why sensitivity declines from some nonnative contrasts but not others, and helped inspire a number of developmental linguistic models of speech perception (Kuhl, 1993; Best, 1994; Werker & Curtin, 2005). For example, in Best’s Perceptual Assimilation Model (PAM), an adult’s ability to hear a nonnative contrast depends on the relation of the contrasting phonemes to one’s native sound system (cf. Best et al., 2001). If both phonemes assimilate equally well to a native category, discrimination is poor (e.g., Japanese perception of English /l/ vs. /r/). If they assimilate as good versus poor members of a native category, discrimination is better, and if they assimilate two different categories, discrimination is better still. The best discrimination is predicted when the nonnative phonemes fail to assimilate to any native speech sounds: This is the situation Best et al. suggest for the Zulu click study. (As will be discussed in the section on empirical comparative studies, Best and Avery, 1999, later provided intriguing evidence that these click sounds were heard as nonspeech by the English listeners.)
图 2.23b基于多维尺度表示图 2.23a中刺激网格的标记之间的感知相似性。详情见正文。黑色圆圈被美国听众识别为“R”,白色圆圈被识别为“L”(日本数据中的白色圆圈被识别为“W”)。
Figure 2.23b Representation of perceived similarity between tokens of the stimulus grid in Figure 2.23a, based on multidimensional scaling. See text for details. Black circles were identified as “R,” and white circles were identified as “L” by the American listeners (the white circle in the Japanese data was identified as a “W”).
Kuhl、Werker、Best 及其同事的工作为有兴趣研究音乐中习得声音类别的发展的科学家提供了重要背景。他们的研究邀请了音乐领域的平行研究,这将在关于声音类别学习的第 2.4 节中进一步讨论。
The work of Kuhl, Werker, Best, and their colleagues provides an important background for scientists interested in studying the development of learned sound categories in music. Their research invites parallel studies in the musical domain, as will be discussed further in section 2.4 on sound category learning.
2.2.3 节介绍了研究声音类别结构的有用工具(“音高间隔和神经科学”小节):不匹配负性 (MMN)。MMN 范式的逻辑很容易适应语音感知的研究。具体来说,母语中的常见声音以标准呈现,然后另一种语音以异常形式呈现。如果 MMN 只对声音的物理方面敏感,那么对异常声音的响应应该只取决于它与标准的物理距离。然而,如果 MMN 对异常的语言状态表现出敏感性(即,当异常变成语言中的不同音素时反应更大),那么这就是大脑在心理上对声音变化做出反应的证据学习声音类别的框架。Näätänen 等人提供了此类证据。(1997), 谁研究了芬兰语和爱沙尼亚语中元音的 MMN。标准是在两种语言中都出现的元音 (/e/),而偏差是通过操纵第二共振峰 (F2) 介于 /e/ 和 /o/ 之间的四个不同元音。其中三个异常对应于两种语言中使用的元音,但一个异常 (/õ/) 仅在爱沙尼亚语中是一个独特的元音(在芬兰语中,这个声音正好介于其他两个常用元音之间)。Näätänen 等人。记录了芬兰人和爱沙尼亚人对不同偏差者的 MMN,发现只有元音 /õ/ 的 MMN 在两组之间有所不同。爱沙尼亚人对这种声音表现出明显更大的 MMN,这表明学习的声音类别正在调节大脑反应。通过对第二共振峰 (F2) 的操作,偏差是介于 /e/ 和 /o/ 之间的四个不同元音。其中三个异常对应于两种语言中使用的元音,但一个异常 (/õ/) 仅在爱沙尼亚语中是一个独特的元音(在芬兰语中,这个声音只是介于其他两个常用元音之间)。Näätänen 等人。记录了芬兰人和爱沙尼亚人对不同偏差者的 MMN,发现只有元音 /õ/ 的 MMN 在两组之间有所不同。爱沙尼亚人对这种声音表现出明显更大的 MMN,这表明学习的声音类别正在调节大脑反应。通过对第二共振峰 (F2) 的操作,偏差是介于 /e/ 和 /o/ 之间的四个不同元音。其中三个异常对应于两种语言中使用的元音,但一个异常 (/õ/) 仅在爱沙尼亚语中是一个独特的元音(在芬兰语中,这个声音正好介于其他两个常用元音之间)。Näätänen 等人。记录了芬兰人和爱沙尼亚人对不同偏差者的 MMN,发现只有元音 /õ/ 的 MMN 在两组之间有所不同。爱沙尼亚人对这种声音表现出明显更大的 MMN,这表明学习的声音类别正在调节大脑反应。但是有一个变体 (/õ/) 是仅在爱沙尼亚语中才有的独特元音(在芬兰语中,这个声音正好介于另外两个常用元音之间)。Näätänen 等人。记录了芬兰人和爱沙尼亚人对不同偏差者的 MMN,发现只有元音 /õ/ 的 MMN 在两组之间有所不同。爱沙尼亚人对这种声音表现出明显更大的 MMN,这表明学习的声音类别正在调节大脑反应。但是有一个变体 (/õ/) 是仅在爱沙尼亚语中才有的独特元音(在芬兰语中,这个声音正好介于另外两个常用元音之间)。Näätänen 等人。记录了芬兰人和爱沙尼亚人对不同偏差者的 MMN,发现只有元音 /õ/ 的 MMN 在两组之间有所不同。爱沙尼亚人对这种声音表现出明显更大的 MMN,这表明学习的声音类别正在调节大脑反应。
A useful tool for studying the structure of sound categories was introduced in section 2.2.3 (subsection “Pitch Intervals and Neuroscience”): the mismatch negativity (MMN). The logic of the MMN paradigm is easily adapted to studies of speech perception. Specifically, a common sound from the native language is presented at the standard, and then another speech sound is presented as the deviant. If the MMN is sensitive only to physical aspects of sounds, then the response to the deviant sound should depend only on its physical distance from the standard. If, however, the MMN shows sensitivity to the linguistic status of the deviant (i.e., a greater response when the deviant becomes a different phoneme in the language), then this is evidence that the brain is responding to sound change in terms of a mental framework of learned sound categories. Such evidence was provided by Näätänen et al. (1997), who studied the MMN to vowels in Finns and Estonians. The standard was a vowel (/e/) that occurred in both languages, and the deviants were four different vowels that ranged between /e/ and /o/ via manipulation of the second formant (F2). Three of these deviants corresponded to vowels used in both languages, but one deviant (/õ/) was a distinctive vowel only in Estonian (in Finnish, this sound simply fell between two other frequently used vowel sounds). Näätänen et al. recorded the MMN in Finns and Estonians to the different deviants, and found that only for the vowel /õ/ did the MMN differ between the two groups. Estonians showed a significantly larger MMN to this sound, suggesting that a learned sound category was modulating the brain response.
与大多数 MMN 研究一样,本研究中的 MMN 是在不需要参与者执行任何任务的情况下收集的,参与者在大脑记录期间阅读了一本书。因此,MMN 反映了一种前注意反应,并提供了一个关于声音感知心理框架的独特窗口。Näätänen 等人的研究结果。已在其他语言中得到证实(参见 Näätänen & Winkler, 1999,Phillips 等人,2000 年和 Kazanina 等人,2006 年,MMN 对辅音作为声音类别的研究)。它们还导致了一种基于大脑的方法来研究婴儿语言声音类别的发展(Cheour 等人,1998 年)。如第 2.2.3 节所述, MMN 非常适合研究音乐声音类别的发展,并提供了使用通用方法并行研究这两种类别的发展的机会。
As with most MMN studies, the MMN in this study was collected without requiring any task from the participants, who read a book during the brain recordings. Thus the MMN reflects a preattentive response and provides a unique window on mental frameworks for sound perception. The findings of Näätänen et al. have been corroborated in other languages (cf. Näätänen & Winkler, 1999, Phillips et al., 2000, and Kazanina et al., 2006, for MMN research on consonants as sound categories). They have also led to a brain-based approach for studying the development of linguistic sound categories in infants (Cheour et al., 1998). As mentioned in section 2.2.3, the MMN is well suited to studying the development of musical sound categories, and offers a chance to study the development of both kinds of categories in parallel, using a common methodology.
我们对音乐和语言声音系统的概述表明,音高和音色在这两个领域中的组织方式截然不同。例如,普通语音不包含稳定的音高间隔和音乐序列很少基于有组织的音色对比。因此,将“喜欢”与“喜欢”进行比较会给人留下这样的印象,即音乐和语言声音系统几乎没有共同之处。然而,从认知神经科学的角度来看,真正的兴趣在于隐藏在这些表面差异之下的更深层次的相似性,即这两个系统都依赖于学习声音类别的心理框架(参见 Handel,1989)。事实上,大脑已经找到了两种完全不同的方式来构建有组织的声音类别系统,这表明声音类别学习是人类认知的一个基本方面。因此,音乐和语言声音系统的比较研究的一个自然焦点是创建和维护学习声音类别的机制。这些机制在多大程度上在域之间共享?一种可能性是这些机制几乎没有共同之处。事实上,音乐和语言声音系统之间认知和神经分离的证据似乎表明情况确实如此(在下文第 2.4.1 节)。另一种可能性是音乐和语言在很大程度上共享声音类别学习机制(McMullen 和 Saffran,2004)。可以将此称为“共享声音类别学习机制假设 (SSCLMH)”。这一假设的一个含义是,必须在可能是特定领域的发展最终产品和可能是一般领域的发展过程之间进行明确的概念区分。
Our overview of musical and linguistic sound systems has shown that pitch and timbre are organized quite differently in the two domains. For example, ordinary speech does not contain stable pitch intervals, and musical sequences are rarely based on organized timbral contrasts. Thus comparing “like to like” can leave the impression that musical and linguistic sound systems have little in common. From the perspective of cognitive neuroscience, however, the real interest lies in a deeper similarity that lies beneath these surface differences, namely, that both systems depend on a mental framework of learned sound categories (cf. Handel, 1989). Indeed, the fact that the mind has found two entirely different ways of building organized sound category systems suggests that sound category learning is a fundamental aspect of human cognition. Thus a natural focus for comparative research on musical and linguistic sound systems is on the mechanisms that create and maintain learned sound categories. To what extent are these mechanisms shared between domains? One possibility is that these mechanisms have little in common. Indeed, evidence for cognitive and neural dissociations between musical and linguistic sound systems would seem to indicate that this is the case (reviewed in section 2.4.1 below). Another possibility is that music and language share mechanisms for sound category learning to an important degree (McMullen & Saffran, 2004). One might call this “shared sound category learning mechanism hypothesis (SSCLMH).” One implication of this hypothesis is that a clear conceptual distinction must be made between the end products of development, which may be domain specific, and developmental processes, which may be domain general.
在2.4.2和2.4.3节中,我回顾了支持 SSCLMH 的研究,重点关注音程和和弦作为音乐声音类别,以及元音和辅音作为语言声音类别。因为这是比较研究的一个年轻领域,所以在第 2.4.5 节中,我花一些时间为旨在探索共享学习机制的未来工作指明了方向。在开始这些部分之前,我首先在下一节处理明显的反证,并说明为什么这个证据实际上与 SSCLMH 并不矛盾。
In sections 2.4.2 and 2.4.3, I review studies which support the SSCLMH, focusing on pitch intervals and chords as musical sound categories and vowels and consonants as linguistic sound categories. Because this is a young area of comparative research, in section 2.4.5, I devote some time to pointing the way to future work aimed at exploring shared learning mechanisms. Before embarking on these sections, I first deal with apparent counterevidence in the following section, and show why this evidence is in fact not incompatible with the SSCLMH.
有充分的理由相信大脑以不同的方式对待口头和音乐的声音系统。首先,局灶性皮层损伤会导致剧烈的分离,从而严重损害理解语言的能力,但对音乐声音的感知却完好无损,反之亦然 (Poeppel, 2001; Peretz & Coltheart, 2003)。其次,神经心理学和神经影像学有充分的证据表明两个大脑半球在声音处理方面有不同的偏见。许多音调感知任务表现出对右半球回路的更大依赖,而许多语言音素任务表现出对左半球的更大依赖(例如,Zatorre 等人,1996 年,2002 年;Stewart 等人,2006 年)。第三,有观点认为感知的“语音模式”违反了听觉感知组织的正常原则(例如,
There are good reasons to believe that the brain treats spoken and musical sound systems differently. First, focal cortical damage can lead to dramatic dissociations whereby the ability to interpret speech is profoundly impaired, yet the perception of musical sounds is intact, or vice versa (Poeppel, 2001; Peretz & Coltheart, 2003). Second, there is ample evidence from neuropsychology and neuroimaging that the two cerebral hemispheres have different biases in sound processing. Many musical pitch perception tasks show a greater dependence on right hemisphere circuits, whereas many linguistic phonemic tasks show a greater reliance on the left hemisphere (e.g., Zatorre et al., 1996, 2002; Stewart et al., 2006). Third, there are arguments for a “speech mode” of perception that contravenes normal principles of auditory perceptual organization (e.g., Liberman, 1996).
事实上,这些发现都不与在两个领域中学习声音类别的共享机制的想法相矛盾。我将依次处理每个发现。
In fact, none of these findings contradicts the idea of shared mechanisms for the learning of sound categories in the two domains. I shall treat each finding in turn.
脑损伤后感知语音与音乐声音的分离仅表明这两个领域中的声音类别表示一旦学会,就它们在大脑中的位置而言并不完全重叠。是否使用类似的机制来创建这些表示是一个正交问题。
Dissociations for perceiving spoken versus musical sounds after brain damage simply indicate that sound category representations in the two domains, once learned, do not completely overlap in terms of their location in the brain. Whether or not similar mechanisms are used to create these representations is an orthogonal question.
类比在这里可能会有所帮助。想象一家制造汽车和摩托车的工厂,并将成品车辆存放在不同的房间。工厂的损坏(例如火灾)可能只会毁坏装有汽车或摩托车的房间,但这并没有告诉我们用于制造这两种车辆的工具和过程的重叠程度。
An analogy might be helpful here. Imagine a factory that makes cars and motorcycles, and that keeps the finished vehicles in different rooms. It is possible that damage to the factory (such as a fire) could destroy just the rooms containing cars, or containing motorcycles, but this tells us nothing about the degree of overlap in the tools and processes used to make the two kinds of vehicles.
鉴于听觉系统倾向于将声音分解成它们的声学成分并以有序的方式映射这些成分,似乎很可能不会对口语和音乐声音进行长期表征占据完全相同的皮层区域。这可以简单地反映语音和音乐的声音类别在声学方面往往不同的事实(例如,分别依赖于音色与音调)。事实上,旨在定位与语音和音乐声音感知相关的大脑信号的研究确实报告了一些分离(例如,Tervaniemi 等人,1999 年,2000 年;参见 Scott 和 Johnsrude,2003 年)。例如,Tervaniemi 等人。(2006) 让参与者听一段重复的语音(一个双音节的无意义词)或一段在持续时间、强度和频谱内容方面大致匹配的音乐声音(萨克斯管音调)。使用 fMRI 进行的大脑扫描显示,两种声音都会在颞上回 (STG) 的双侧听觉皮层中产生强烈的激活。然而,如图所示图 2.24,与音乐声音相比,语音在 STG 的下侧和外侧区域产生更多的激活,音乐声音本身在 STG 的上/内侧表面和 Heschl 回(初级听觉皮层所在)产生更多的激活。这一发现支持这样一种观点,即位置略有不同的病变可以选择性地损害语音和音乐感知。
Given that the auditory system tends to break sounds down into their acoustic components and map these components in orderly ways, it seems likely that long-term representations of spoken and musical sounds will not occupy exactly the same regions of cortex. This could simply reflect the fact that the sound categories of speech and music tend to be different in acoustic terms (e.g., relying on timbre vs. pitch, respectively). In fact, studies aimed at localizing brain signals associated with the perception of phonetic versus musical sounds do report some separation (e.g., Tervaniemi et al., 1999, 2000; cf. Scott & Johnsrude, 2003). For example, Tervaniemi et al. (2006) had participants listen to a repeating speech sound (a two-syllable nonsense word) or a musical sound (saxophone tones) that were roughly matched in terms of duration, intensity, and spectral content. Brain scanning with fMRI revealed that both kinds of sounds produced strong activations in bilateral auditory cortex in the superior temporal gyrus (STG). However, as shown in Figure 2.24, speech sounds produced more activation in inferior and lateral areas of the STG than music sounds, which themselves produced more activation in the superior/medial surface of the STG and of Heschl’s gyrus (which houses primary auditory cortex). This finding supports the idea that lesions with slightly different locations could selectively impair speech versus music perception.
语音和音乐感知在处理过程中表现出左右半球不对称的观点是神经心理学中一个古老而根深蒂固的观点。然而,正如我们将要看到的,对证据的更仔细检查表明,语言和音乐都在听觉皮层中双向代表它们的声音类别。
The idea that speech and music perception show left versus right hemispheric asymmetry in processing is an old and firmly entrenched idea in neuropsychology. Yet as we shall see, a closer examination of the evidence suggests that both language and music represent their sound categories bilaterally in auditory cortex.
图 2.24大脑区域在感知语音与音乐声音(对角线)或相反(点画点)期间显示出明显更多的激活。STG = 颞上回,HG = Heschl 的回(包含初级听觉皮层),STS = 颞上沟。来自 Tervaniemi 等人,2006 年。
Figure 2.24 Brain regions showing significantly more activation during perception of speech versus musical sounds (diagonal lines) or vice versa (stippled dots). STG = superior temporal gyrus, HG = Heschl’s gyrus (which contains primary auditory cortex), STS = superior temporal sulcus. From Tervaniemi et al., 2006.
众所周知,将参与者的注意力集中在音素感知上的任务与神经影像学研究中更大的左半球活动有关,通常涉及跨越左颞上听觉颞叶皮层和左下额叶皮层的网络(Zatorre 等人,1996 年) ). 相比之下,许多涉及音调感知的任务显示出右半球偏差。扎托雷等人。(2002) 表明,语音和音乐之间的这种差异是由于两个听觉皮层在处理声音的时间结构和频谱结构方面的互补解剖学和功能特化所致(参见 Poeppel,2003 年)。根据这种观点,对语音快速但频谱粗糙的音色对比的感知更多地依赖于左半球电路,
It is well known that tasks that focus participants’ attention on phoneme perception are associated with greater left-hemisphere activity in neuroimaging studies, often involving a network that spans left superior temporal auditory temporal cortex and left inferior frontal cortex (Zatorre et al., 1996). In contrast, many tasks involving musical pitch perception show a right hemisphere bias. Zatorre et al. (2002) suggest that this difference between speech and music is due to complementary anatomical and functional specializations of the two auditory cortices for processing the temporal versus spectral structure of sound (cf. Poeppel, 2003). According to this view, perception of the rapid but spectrally coarse timbral contrasts of speech relies more on left hemisphere circuits, whereas analysis of slower but more spectrally refined pitch contrasts of music relies more on right hemisphere circuits.
然而,来自各种研究的有趣证据表明,听觉感知的半球不对称不仅仅是声音物理结构的函数。例如,Best 和 Avery (1999) 检查了祖鲁语与英语使用者在祖鲁语点击对比感知中的大脑偏侧化。点击代表快速声学转换(~ 50 毫秒),根据上述理论预测左半球处理偏差。然而,只有祖鲁人的听众表现出这种偏见。Best 和 Avery 认为这是因为完全不熟悉这些罕见语音的英语听众没有将它们作为语音处理。相比之下,祖鲁听众听到的咔嗒声是他们语言中的声音类别。因此,似乎语音处理中的左向不对称可能受到声音作为语言中习得的声音类别的地位,而不仅仅是通过其物理特征。这一观点的进一步证据来自 Gandour 等人的脑成像研究。(2000) 表明泰语音节中声调对比的感知激活了泰语听众的左额叶皮层(对他们而言,这种对比在语言上很重要),但在英语和中文听众中则没有。换句话说,当音调作为语言学上重要的类别时,有证据表明左半球偏向而不是右半球偏向(参见 Wong 等人,2004 年;Carreiras 等人,2005 年)。
However, there is intriguing evidence from a variety of studies that hemispheric asymmetries in auditory perception are not just a function of the physical structure of sound. For example, Best and Avery (1999) examined brain lateralization in the perception of Zulu click contrasts by Zulu versus English speakers. Clicks represent a rapid acoustic transition (~ 50 ms), that predict a left hemisphere processing bias according to the theory described above. However, only the Zulu listeners exhibited such a bias. Best and Avery suggest that this was because the English listeners, who were completely unfamiliar with these rare speech sounds, did not process them as speech. In contrast, the Zulu listeners heard the clicks as sound categories from their language. Thus it seems that leftward asymmetries in speech sound processing may be influenced by a sound’s status as a learned sound category in the language, and not just by its physical characteristics. Further evidence for this view comes from brain imaging research by Gandour et al. (2000) showing that the perception of tonal contrasts in Thai syllables activates left frontal cortex in Thai listeners (for whom the contrast was linguistically significant) but not in English and Chinese listeners. In other words, when pitch acts as a linguistically significant category, there is evidence for left rather than right hemisphere bias (cf. Wong et al., 2004; Carreiras et al., 2005).
是什么导致了对语音类别感知的这种左偏?Hickok 和 Poeppel (2004) 认为它是由两个大脑系统的接口驱动的:一个双侧后系统(在颞上听觉区)参与将语音声音映射到语音类别,以及一个左侧化的参与语音发音的额叶脑区。根据 Hickok 和 Poeppel 的说法,在各种语音感知神经研究(包括祖鲁和泰国研究)中出现左半球偏差是因为给参与者的任务导致他们在基于声音和基于发音的语音表征之间进行映射,例如,作为将单词明确分割为音素成分的一部分。这些作者认为,在更自然的感知环境下,语音会被双边处理。为了支持这一观点,他们指出,在脑成像研究中,被动聆听语音与颞上皮层的双侧激活有关。此外,
What accounts for this leftward bias for the perception of speech sound categories? Hickok and Poeppel (2004) suggest that it is driven by the interface of two brain systems: a bilateral posterior system (in superior temporal auditory regions) involved in mapping phonetic sounds onto speech sound categories, and a left-lateralized frontal brain region involved in articulatory representations of speech. According to Hickok and Poeppel, the left hemisphere bias seen in various neural studies of speech perception (including the Zulu and Thai studies) occurs because the tasks given to participants lead them to map between their sound-based and articulatory-based representations of speech, for example, as part of explicitly segmenting words into phonemic components. These authors argue that under more natural perceptual circumstances, speech sounds would be processed bilaterally. In support of this idea, they point out that passive listening to speech is associated with bilateral activation of superior temporal cortex in brain imaging studies. Furthermore, cases of severe and selective difficulty with speech perception following brain damage (“pure word deafness”) almost always involve bilateral lesions to the superior temporal lobe (Poeppel, 2001).
音乐的半球不对称性如何?在评估音乐半球不对称的证据时,准确指定正在研究音乐的哪个方面至关重要。例如,涉及旋律轮廓分析的研究倾向于显示右半球偏差,无论音高模式是音乐的还是语言的(例如,Patel,Peretz 等,1998)。然而,如果对音乐声音类别的皮层表征感兴趣,则有必要关注音乐声音分类方面的感知,例如音高间隔。有鉴于此,Liégeois-Chauvel 等人的一项研究。(1998) 特别有趣。他们检查了 65 名接受单侧颞叶皮质切除术以缓解癫痫的患者的音乐感知能力。他们发现切除两者半球损害了音高间隔信息的使用,而只有右半球切除损害了音高轮廓信息的使用(参见 Boemio 等人,2005)。因此,学习的音程声音类别似乎在大脑中具有双边表示,类似于 Hickok 和 Poeppel 所主张的语音类别的双边表示。
What of hemispheric asymmetries for music? When assessing the evidence for musical hemispheric asymmetries, it is crucial to specify exactly which aspect of music is being studied. For example, studies involving the analysis of melodic contour tend to show a right-hemisphere bias, whether the pitch patterns are musical or linguistic (e.g., Patel, Peretz, et al., 1998). However, if one is interested in the cortical representation of sound categories for music, then it is necessary to focus on the perception of categorized aspects of musical sound, such as pitch intervals. In this light, a study of Liégeois-Chauvel et al. (1998) is particularly interesting. They examined music perception in 65 patients who had undergone unilateral temporal lobe cortical excisions for the relief of epilepsy. They found that excisions to both hemispheres impaired the use of pitch interval information, whereas only right hemisphere excisions impaired the use of pitch contour information (cf. Boemio et al., 2005). Thus the learned sound categories of musical intervals appear to have a bilateral representation in the brain, analogous to the bilateral representation of speech sound categories argued for by Hickok and Poeppel.
总而言之,语音和音乐感知的半球不对称肯定存在,但比通常认为的更微妙。我们一无所知这些不对称性与两个领域中声音类别的共享学习机制的想法相矛盾。
In summary, hemispheric asymmetries for speech and music perception certainly exist, but are more subtle than generally appreciated. Nothing we know about these asymmetries contradicts the idea of shared learning mechanisms for sound categories in the two domains.
自从发现音素对比的分类感知以来,语音感知的特殊认知和神经机制的想法长期以来一直是语音研究的克制(Liberman 等人,1957 年,1967 年;参见 Eimas 等人,1971 年)。问题不是语义或句法处理,而是声音和语言类别之间的基本映射。Kuhl 和 Miller (1975) 对南美啮齿动物(龙猫)对塞音的分类感知的论证是反对“专门机制”观点的有影响力的证据,尽管它决没有平息争论(参见 Pastore 等人。 ,1983 年;Kluender 等人,1987 年;Hall 和 Pastore,1992 年;Trout,2003 年;Diehl 等人,2004 年)。
The idea of special cognitive and neural mechanisms for speech perception has long been a refrain in speech research, ever since the discovery of categorical perception for phoneme contrasts (Liberman et al., 1957, 1967; cf. Eimas et al., 1971). What is at issue is not semantic or syntactic processing, but the basic mapping between sounds and linguistic categories. The demonstration by Kuhl and Miller (1975) of categorical perception for stop consonants in a South American rodent (the chinchilla) was influential evidence against the “specialized mechanisms” view, though it by no means settled the debate (cf. Pastore et al., 1983; Kluender et al., 1987; Hall & Pastore, 1992; Trout, 2003; Diehl et al., 2004). There are still strong proponents of the idea that we listen to speech using different mechanisms than those used for other sounds (Liberman, 1996).
特殊语音感知模式的最引人注目的证据形式之一来自对“正弦波语音”的研究(Remez 等人,1981 年)。正弦波语音是通过从频谱图中跟踪语音共振峰的中心频率创建的(如图 2.21 所示)), 然后为每个共振峰合成一个正弦波, 精确地再现共振峰的频率随时间变化的模式。基于前两个或三个语音共振峰(F1、F2 和 F3)的正弦波语音对于不熟悉这种刺激的听众来说听起来像是毫无意义的频率变化流。然而,一旦准备好以语音形式听到刺激,听众就会体验到戏剧性的感知转变,并且看似漫无目的的一系列频率滑行会整合到连贯的语音感知中。(声音示例 2.8a–b 说明了这种现象。听 2.8a,然后 2.8b,然后再听 2.8a。)29也就是说,听众将多个频率融合成一个单一的辅音和元音流,这些辅音和元音构成了可理解的单词(尽管“声音”仍然是不人道的电子声音)。对于那些经历过它的人来说,这种转变是令人信服的证据,表明存在一种特殊的听觉处理模式,与听到语音模式相关。雷梅兹等人。(1994) 提供的数据表明,正弦波语音的感知违反了听觉分组的格式塔原则,并主张语音与听觉组织原则的独立性。
One of the most dramatic forms of evidence for a special mode of speech perception comes from studies of “sine-wave speech” (Remez et al., 1981). Sine-wave speech is created by tracing the center frequencies of speech formants from a spectrogram (such as in Figure 2.21), and then synthesizing a sine wave for each formant that exactly reproduces the formant’s pattern of frequency change over time. Sine-wave speech based on the first two or three speech formants (F1, F2, and F3) can sound like meaningless streams of frequency variation to a listener who is unfamiliar with this stimulus. However, once primed to hear the stimulus as speech, listeners experience a dramatic perceptual shift and the seemingly aimless series of frequency glides becomes integrated into a coherent speech percept. (Sound Examples 2.8a–b illustrate this phenomenon. Listen to 2.8a, then 2.8b, and then 2.8a again.)29 That is, the listener fuses the multiple frequencies into a single stream of consonants and vowels that make up intelligible words (though the “voice” is still inhuman and electronic-sounding). For those who have experienced it, this shift is compelling evidence that there is a special mode of auditory processing associated with hearing sound patterns as speech. Remez et al. (1994) have presented data suggesting that the perception of sine-wave speech violates Gestalt principles of auditory grouping, and argue for the independence of phonetic versus auditory organizational principles.
尽管 Remez 及其同事的研究为感知的“语音模式”提供了引人入胜的证据,但这种模式的存在与在语音和音乐中发展习得声音类别的共享处理机制之间并不存在逻辑上的矛盾。也就是说,正弦波语音现象(将声学模式转变为音素模式)假定语音的声音类别已经发展。这种现象的显着之处在于,相同的听觉信号可以激活或不激活这些类别的表征,这取决于听众的期望。30因此,没有理由可以帮助形成语言中的音位类别的相同机制不能也用于音乐中音高间隔类别的形成。(如果这个想法是正确的,那么可能存在音乐的正弦波语音模拟,即,一个信号听起来像一个或多个漫无目的的滑音音高变化流,直到人们准备好将其当作音乐来听,在这种情况下声音被解释为音乐声音类别的连贯序列;参见 Demany & McAnally,1994。)
Although the research of Remez and colleagues provides fascinating evidence for a “speech mode” of perception, there is no logical contradiction between the existence of such a mode and shared processing mechanisms for developing learned sound categories in speech and music. That is, the phenomenon of sine-wave speech (turning acoustic patterns into phonemic ones) assumes that sound categories for speech have already developed. What is remarkable about the phenomenon is that the same auditory signal can either activate the representations of these categories or not, depending on the listener’s expectations.30 Thus there is no reason why the same mechanisms that helped form phonemic categories in language could not also serve the formation of pitch interval categories in music. (If this idea is correct, there may be an analog of sine-wave speech for music, i.e., a signal that sounds like one or more aimless streams of gliding pitch variation until one is primed to hear it as music, in which case the sound is interpreted as coherent sequence of musical sound categories; cf. Demany & McAnally, 1994.)
事实上,音乐和语言声音类别在成人大脑中在声学上是不同的并且在神经上是可分离的,这在逻辑上并不要求它们的发展基于特定领域的学习过程。事实上,从认知的角度来看,共享学习机制的概念是有道理的,因为在这两种情况下都必须解决类似的问题。回想一下,音乐和语音的声音并不是以精确的、柏拉图式的方式实现的。例如,“相同”音高间隔的大小可能会因偶然变化和当地旋律背景的影响而变化(参见第 2.2.3 节)。类似地,给定辅音或元音的不同标记的声学结构根据语音上下文而变化,即使在一个人的语音中也是如此。听众因此需要开发一个心理框架,使他们能够从声学可变信号中提取少量有意义的类别。在接下来的两个小节中,我将讨论表明声音类别的发展涉及语音和音乐共享过程的研究。
The fact that musical and linguistic sound categories are acoustically distinct and neurally dissociable in the adult brain does not logically demand that their development is based on domain-specific learning processes. In fact, from a cognitive perspective, the notion of shared learning mechanisms makes sense, because a similar problem must be solved in both cases. Recall that the sounds of music and speech are not realized in a precise, Platonic fashion. For example, the “same” pitch interval can vary in size both due to chance variation and to the influence of local melodic context (cf. section 2.2.3). Similarly, different tokens of a given consonant or vowel vary in their acoustic structure depending on phonetic context, even in the speech of a single person. Listeners thus need to develop a mental framework that allows them to extract a small number of meaningful categories from acoustically variable signals. In the next two subsections, I discuss studies that suggest that the development of sound categories engages processes shared by speech and music.
如果存在学习语音和音乐声音类别的共享机制,那么这些机制功效的个体差异应该会影响这两个领域。也就是说,一个人在一个领域学习声音类别的能力应该对另一个领域的声音类别学习具有一定的预测能力。最近对儿童和成人的实证研究支持这一预测,因为它发现与音高相关的音乐能力可以预测语言的语音技能。
If there are shared mechanisms for learning the sound categories of speech and music, then individual variation in the efficacy of these mechanisms should influence both domains. That is, an individual’s ability to learn sound categories in one domain should have some predictive power with regard to sound category learning in the other domain. Recent empirical research on children and adults supports this prediction, because it finds that pitch-related musical abilities predict phonological skills in language.
安瓦里等人。(2002) 研究了 4 岁和 5 岁讲英语的大样本(每组 50 名儿童)的早期阅读技能与音乐发展之间的关系。学习阅读英语需要将视觉符号映射到音位对比上,从而利用语言声音分类技能。孩子们被分配了一系列广泛的任务,包括阅读、音位意识、词汇、听觉记忆和数学测试。(音素意识是指识别单词的声音成分的能力,大量研究表明,具有较高音素意识的儿童在学习阅读方面具有优势。)在音乐方面,测试了音调和节奏辨别能力. 音调任务包括对短旋律和和弦的相同/不同区分,节奏任务涉及对短节奏模式的相同/不同辨别和通过歌唱再现节奏。最有趣的发现与 5 岁儿童有关。对于这个群体,音高(但不是节奏)任务的表现预测了阅读能力的独特差异,即使在音素意识被控制的情况下也是如此。此外,统计分析表明,这种关系不能通过听觉记忆等其他变量的间接影响来解释。这一发现与语言和音乐声音类别的共享学习过程的想法是一致的。这些过程功效的个体差异可能是由于内部因素或影响听觉输入质量的环境条件造成的(参见 Chang 和 Merzenich,2003 年)。
Anvari et al. (2002) studied the relation between early reading skills and musical development in a large sample of English-speaking 4- and 5-year-olds (50 children per group). Learning to read English requires mapping visual symbols onto phonemic contrasts, and thus taps into linguistic sound categorization skills. The children were given an extensive battery of tasks that included tests of reading, phonemic awareness, vocabulary, auditory memory, and mathematics. (Phonemic awareness refers to the ability to identify the sound components of a word, and a large body of research indicates that children with greater phonemic awareness have advantages in learning to read.) On the musical side, both musical pitch and rhythm discrimination were tested. The pitch tasks included same/different discrimination of short melodies and chords, and the rhythm tasks involved same/different discrimination of short rhythmic patterns and reproduction of rhythms by singing. The most interesting findings concerned the 5-year-olds. For this group, performance on musical pitch (but not rhythm) tasks predicted unique variance in reading abilities, even when phonemic awareness was controlled for. Furthermore, statistical analysis showed that this relation could not be accounted for via the indirect influence of other variables such as auditory memory. Such a finding is consistent with the idea of shared learning processes for linguistic and musical sound categories. Individual variation in the efficacy of these processes could be due to internal factors or to environmental conditions that influence the quality of auditory input (cf. Chang & Merzenich, 2003).
转向对成人的研究,Slevc 和 Miyake (2006) 研究了第二语言 (L2) 熟练程度与音乐能力之间的关系。与之前关于该主题的研究相比,Slevc 和 Miyake 以定量方式测量了语言和音乐技能,并且还测量了其他潜在的混杂变量。与一群 50 名日本成年人一起工作生活在美国的英语学习者,他们进行了语言测试,检查接受性和产生性的音韵、句法和词汇知识。(接受音位学的测试包括识别因单个音素而不同的词,例如“小丑”与“皇冠”。)音乐测试检查了音高模式感知,例如,通过检测和弦中改变的音符或在短旋律中,以及唱回短旋律的准确性。至关重要的是,他们还测量了与第二语言熟练程度相关的其他变量,例如到达国外的年龄、在那里居住的年数、说第二语言的时间以及母语的短期语音记忆. 感兴趣的问题是音乐能力是否可以解释 L2 能力的差异,而不是这些其他变量所解释的差异。正如 Anvari 等人。(2002),作者使用层次回归来梳理不同变量的影响。结果很清楚:音乐能力实际上确实预测了 L2 技能的独特差异。与当前讨论最相关的是,这种预测关系仅限于二语接受和产生音位学,换句话说,与感知中的声音分类技能最直接相关的语言方面。
Turning to research on adults, Slevc and Miyake (2006) examined the relationship between proficiency in a second language (L2) and musical ability. In contrast to previous studies on this topic, Slevc and Miyake measured both linguistic and musical skills in a quantitative fashion, and also measured other potentially confounding variables. Working with a group of 50 Japanese adult learners of English living in the United States, they administered language tests that examined receptive and productive phonology, syntax, and lexical knowledge. (The tests of receptive phonology included identifying words that differed by a single phoneme, e.g., “clown” vs. “crown.”) The musical tests examined pitch pattern perception, for example, via the detection of an altered note in a chord or in a short melody, as well as accuracy in singing back short melodies. Crucially, they also measured other variables associated with second-language proficiency, such as age of arrival in the foreign country, number of years spent living there, amount of time spent speaking the second language, and phonological short-term memory in the native language. The question of interest was whether musical ability could account for variance in L2 ability beyond that accounted for by these other variables. As in Anvari et al. (2002), the authors used hierarchical regression to tease apart the influence of different variables. The results were clear: Musical ability did in fact predict unique variance in L2 skills. Most relevant for the current discussion, this predictive relationship was confined to L2 receptive and productive phonology, in other words, to that aspect of language most directly related to sound categorization skills in perception.
Anvari 等人的研究。Slevc 和 Miyake 之所以引人注目,是因为他们提出了语音和音乐中声音分类技能之间的特定联系,这与这两个领域中声音类别形成的共享机制的想法一致。然而,他们的音乐任务并不是专门设计来测试对音乐声音类别的敏感性,因此需要沿着这些方向进行更精细的研究。一种方法是测试对音高音程的敏感性,例如,通过区分具有相同旋律轮廓但音程模式不同的旋律,使用移调消除绝对音高作为可能的提示(参见 Trainor、Tsang 和 Cheung , 2002). 31还需要测试低级听觉处理技能(例如时间分辨率和音高变化辨别力),以确定原始听觉灵敏度是否可以预测这两个领域的声音分类技能(参见 Ramus,2003 年,2006 年;Rosen , 2003). 例如,Wong 等人。(2007) 最近发现有证据表明音乐训练会增强语言音高模式的皮层下感觉编码(参见 Overy,2003;Tallal & Gaab,2006;Patel & Iversen,2007)。
The studies of Anvari et al. and Slevc and Miyake are notable because they suggest a specific link between sound categorization skills in speech and music, consistent with the idea of shared mechanisms for sound category formation in the two domains. However, their musical tasks were not specifically designed to test sensitivity to musical sound categories and there is thus a need for more refined studies along these lines. One way to do this would be to test sensitivity to pitch intervals, for example, via discrimination of melodies with the same melodic contour but different interval patterns, using transposition to eliminate absolute pitch as a possible cue (cf. Trainor, Tsang, & Cheung, 2002).31 It would also be desirable to test low-level auditory processing skills (such as temporal resolution and pitch change discrimination), to determine whether sound categorization skills in the two domains are predicted by raw auditory sensitivity (cf. Ramus, 2003, 2006; Rosen, 2003). For example, Wong et al. (2007) have recently found evidence that musical training sharpens the subcortical sensory encoding of linguistic pitch patterns (cf. Overy, 2003; Tallal & Gaab, 2006; Patel & Iversen, 2007).
如果大脑用于将声波转换为语音和音乐中的离散声音类别的机制存在重叠,那么可以想象,用来自一个域的声音来练习这些机制可以增强这些机制获取声音类别的能力其他域。研究这个问题的一种方法是询问婴儿广泛接触器乐是否会影响语言发展,尤其是在音素类别的习得方面(J. Saffran,个人交流)。如果是这样,这样的发现将具有相当大的实际意义。
If there is overlap in the mechanisms that the brain uses to convert sound waves into discrete sound categories in speech and music, then it is conceivable that exercising these mechanisms with sounds from one domain could enhance the ability of these mechanisms to acquire sound categories in the other domain. One way to investigate this question is to ask whether extensive exposure to instrumental music in infants influences language development, especially with regard to the acquisition of phonemic categories (J. Saffran, personal communication). If so, such a finding would have considerable practical significance.
我们对语音(或音乐声音)之间差异的感知并不是它们原始声学差异的简单反映,而是受到它们与我们学习的声音类别系统的关系的影响(参见第2.2.3和2.3.4节)). 尽管关于语音类别对听觉感知的影响的大多数早期研究都集中在分类感知上,但最近的研究集中在一种称为感知磁体效应 (PME) 的不同现象上。PME 对当前讨论的重要性在于它被认为是由领域通用学习机制驱动的,换句话说,基于输入统计的分布学习(Kuhl 等人,1992 年;Guenther &贾贾,1996 年)。因此,如果语言和音乐声音类别学习涉及共享机制,则应该在音乐和语音中观察 PME。事实就是如此,如下所述。然而,首先,我为那些不熟悉这种现象的人提供一些关于 PME 的背景知识。
Our perception of the differences between speech sounds (or musical sounds) is not a simple reflection of their raw acoustic differences, but is influenced by their relationship to our learned sound category system (cf. sections 2.2.3 and 2.3.4). Although most early work on the influence of speech categories on auditory perception focused on categorical perception, more recent research has focused on a different phenomenon known as the perceptual magnet effect (PME). The importance of the PME to the current discussion is that it is thought to be driven by a domain-general learning mechanism, in other words, distribution-based learning based on the statistics of the input (Kuhl et al., 1992; Guenther & Gjaja, 1996). Thus if linguistic and musical sound category learning involve shared mechanisms, the PME should be observed in music as well as in speech. This is in fact the case, as described below. First, however, I provide some background on the PME for those unfamiliar with this phenomenon.
PME 关注内部的感知声音类别而不是声音类别之间对比的感知,并且首先由 Kuhl 在元音感知的背景下描述(1991)。Kuhl 合成了一系列元音刺激,所有这些都是元音 /i/ 的变体(变体在第一和第二共振峰 F1 和 F2 的频率上有所不同)。她要求听众使用 7 分制将这些刺激作为 /i/ 的好与差范例进行评分。结果表明,在 (F1, F2) 空间中的某个位置元音获得最高评价。Kuhl 选择评分最高的元音作为“原型”(P) 元音。在 (F1, F2) 空间的不同区域中具有低良好等级的元音被指定为“非原型”(NP)。在随后的感知实验中,听众听到 P 或 NP 重复作为背景刺激,并且必须指出此声音何时更改为另一个版本的 /i/。基本的发现是听众是当原型作为标准时,对声音变化不太敏感,就好像原型就像一个感知的“磁铁”,扭曲了它周围的空间。Kuhl 发现婴儿也表现出这种效应(尽管程度低于成人),但恒河猴则没有,这表明这种效应可能是言语所独有的。其他 Kuhl 和其他人的后续工作质疑和完善了 PME 的概念,并引入了许多方法改进(Lively & Pisoni,1997;Iverson & Kuhl,2000)。
The PME is concerned with perception within sound categories rather than with perception of contrasts between sound categories, and was first described by Kuhl in the context of vowel perception (1991). Kuhl synthesized a range of vowel stimuli, all intended to be variants of the vowel /i/ (variants differed in the frequency of the first and second formants, F1 and F2). She asked listeners to rate these stimuli as good versus poor exemplars of /i/ using a 7-point scale. The results showed that there was a certain location in (F1, F2) space where vowels received the highest rating. Kuhl chose the highest-rated vowel as the “prototype” (P) vowel. A vowel with a low goodness rating in a different region of (F1, F2) space was designated the “nonprototype” (NP). In a subsequent perception experiment, listeners heard either P or NP repeating as a background stimulus, and had to indicate when this sound changed to another version of /i/. The basic finding was that listeners were less sensitive to sound change when the prototype served as the standard, as if the prototype was acting like a perceptual “magnet” that warped the space around it. Kuhl found that infants also showed this effect (though to a lesser extent than adults) but that rhesus monkeys did not, suggesting that this effect might be unique to speech. Subsequent work by other Kuhl and others questioned and refined the concept of the PME, and introduced a number of methodological improvements (Lively & Pisoni, 1997; Iverson & Kuhl, 2000).
PME 在语音研究中被证明有些争议(例如,Lotto et al., 1998; Guenther, 2000),但基本效果似乎是稳健的实验条件合适(Hawkins & Barrett-Jones,准备中)。在转向 PME 的音乐研究之前,应该提到几点。首先,术语“原型”不应被理解为暗示头脑中预先存在的柏拉图类别,而是可以被认为是范例分布中的集中趋势点。其次,范例的分布本身受语境影响,例如说话者的性别、年龄、语速和语音语境,听众可能对此进行归一化(参见第 2.3.3 节,“元音之间的音色对比”小节)。最后,对于我们目前的目的来说也是最重要的,PME 不同于传统的分类感知工作,因为它强调基于分布的统计学习。
The PME has proved somewhat contentious in speech research (e.g., Lotto et al., 1998; Guenther, 2000), but the basic effect seems to be robust when experimental conditions are appropriate (Hawkins & Barrett-Jones, in preparation). Before turning to musical studies of the PME, several points should be mentioned. First, the term “prototype” should not be taken to imply a preexisting Platonic category in the mind, but can be thought of as a point of central tendency in a distribution of exemplars. Second, the distribution of exemplars is itself subject to contextual effects such as the speaker’s gender, age, speaking rate, and phonetic context, for which listeners may normalize (see section 2.3.3, subsection “Timbral Contrasts Among Vowels”). Finally, and most importantly for our current purposes, the PME differs from traditional work on categorical perception in its emphasis on distribution-based statistical learning.
阿克等人。(1995) 首先测试了音乐中的 PME,使用和弦代替元音。和弦是同时音高的集合,在西方调性音乐中起着特别重要的作用,这将在第 5 章中讨论。就目前的目的而言,人们可以将和弦想象成与元音大致相似的东西:一种充当感知单位的频率复合体。阿克等人。使用西方音乐的 C 大调三和弦 (CEG),通过改变组成音调 E 和 G 的调音来构建该和弦的原型和非原型版本。 C大调三重奏作为参考,听众表现更好在原型附近,恰恰与语言中发现的相反。阿克等人。因此建议在音乐中,类别原型充当锚而不是磁铁。
Acker et al. (1995) first tested the PME in music, using chords instead of vowels. Chords are collections of simultaneous pitches, and serve a particularly important role in Western tonal music, as will be discussed in Chapter 5. For the current purposes, one can think of a chord as loosely analogous to a vowel: a complex of frequencies that acts as a perceptual unit. Acker et al. used the C major triad of Western music (C-E-G), constructing prototypical and nonprototypical versions of this chord by varying the tuning of the constituent tones E and G. When asked to perform discrimination tasks in which a prototypical versus nonprototypical (slightly mistuned) version of the C major triad served as the reference, listeners performed better in the vicinity of the prototype, precisely the opposite of what was found in language. Acker et al. therefore suggested that in music, category prototypes acted as anchors rather than magnets.
基于这项工作,PME 似乎不适用于音乐。然而,随后的研究表明,Acker 等人的结果。可能不能代表大多数听众。这是因为这些研究人员只研究了受过音乐训练的受试者。最近,Barrett (1997, 2000) 进行了一项类似的研究,但包括音乐家和非音乐家。她为音乐家复制了早期的结果,但发现非音乐家在类别原型附近表现出更差的辨别力,换句话说,经典的感知磁铁效应。这怎么解释呢?
Based on this work, it appeared that the PME might not apply to music. However, subsequent research has shown that the results of Acker et al. may not be representative of most listeners. This is because these researchers had only studied musically trained subjects. More recently, Barrett (1997, 2000) performed a similar study, but included both musicians and nonmusicians. She replicated the earlier results for musicians, but found that nonmusicians showed worse discrimination in the vicinity of the category prototype, in other words, the classic perceptual magnet effect. How could this be explained?
Barrett 为这个难题提出了一个有趣的解决方案。她认为,音乐家受过训练,会非常注意原型附近的声音,以便为他们的乐器调音。相比之下,如果非音乐家能够简单地将一个和弦与另一个和弦区分开来,并且可能很少注意和弦的精确调音,那么他们就会受益匪浅。巴雷特假设,一般来说,听众在将注意力集中在某个类别中的声学细节上时,不会消耗超过必要的能量。32巴雷特提议,分歧对原型的关注导致原型和弦充当音乐家的“感知排斥器”和非音乐家的“感知吸引器”。因此,她改进了“感知磁铁”的概念,并提出听者的各个方面决定了磁铁是作为吸引子还是排斥器。
Barrett proposed an interesting solution to this puzzle. She argued that musicians are trained to pay good deal of attention to sounds in the vicinity of prototypes in order to tune their instruments. Nonmusicians, in contrast, are well served if they can simply distinguish one chord from another, and likely pay little attention to the precise tuning of chords. Barrett hypothesized that in general, listeners expend no more energy than necessary in focusing their attention on acoustic details within a category.32 Barrett proposed that differences in attention to prototypes result in prototypical chords acting as “perceptual repellors” for musicians and “perceptual attractors” for nonmusicians. She thus refined the notion of a “perceptual magnet” and proposed that aspects of the listener determine whether a magnet acts as an attractor or a repellor.
有鉴于此,最近对非语言声音中 PME 的研究被证明特别有趣。Guenther 等人。(1999) 进行了一项研究,其中听众听到具有不同中心频率的窄带滤波白噪声。一组接受了分类训练,在训练中他们学会了将特定频率区域中的刺激识别为单个类别的成员。另一组接受了辨别训练,并练习区分这个区域的刺激(有反馈)。当两组在随后的辨别任务中接受测试时,分类组显示他们在训练区域与控制区域(换句话说,PME)中辨别刺激的能力有所下降。相反,歧视组表现出相反的效果。在随后的研究中,Guenther 及其同事将听觉皮层的神经网络模型与 fMRI 研究相结合,以探索 PME 的神经机制。基于这项工作,他们提出样本的聚集分布,加上将刺激分类为感知相关类别的驱动力,导致听觉皮层地图发生变化,使得原型比非原型具有更小的表示(Guenther 等人,2004 年) ). 根据这个模型,原型更难与邻近的标记区分开来,因为它们由较少的皮层细胞表示。重要的是,研究人员将此视为学习声音类别的领域通用机制,这一想法得到了他们对非语言刺激的研究的支持。
In this light, recent research on the PME in nonlinguistic sounds has proved particularly interesting. Guenther et al. (1999) conducted a study in which listeners heard narrow-band filtered white noise with different center frequencies. One group received categorization training, in which they learned to identify stimuli in a certain frequency region as members of a single category. The other group received discrimination training, and practiced telling stimuli in this region apart (with feedback). When the two groups were tested on a subsequent discrimination task, the categorization group showed a reduction in their ability to discriminate stimuli in the training region versus in a control region, in other words, a PME. In contrast, the discrimination group showed the reverse effect. In subsequent research, Guenther and colleagues have combined neural network models of auditory cortex with fMRI studies in order to probe the neural mechanisms of the PME. Based on this work, they have suggested that a clumped distribution of exemplars, combined with a drive to sort stimuli into perceptually relevant categories, result in changes to auditory cortical maps such that prototypes to have smaller representations than nonprototypes (Guenther et al., 2004). According to this model, prototypes are more difficult to discriminate from neighboring tokens simply because they are represented by fewer cortical cells. Importantly, the researchers see this as a domain general mechanism for learning sound categories, an idea supported by their work with nonspeech stimuli.
总而言之,比较 PME 研究支持声音类别感知的领域一般发展机制的概念,并说明我们对声音分类过程的理解如何受益于跨语言和音乐听觉处理无缝流动的研究。
In summary, comparative PME research supports the notion of domain-general developmental mechanisms for sound category perception, and illustrates how our understanding of sound categorization processes can benefit from research that flows seamlessly across linguistic and musical auditory processing.
语音研究中一个公认的观点是,婴儿的知觉敏感性会适应本地声音系统。回忆一下2.3.4 节(关于辅音的小节),婴儿最初对许多细微的语音对比很敏感,包括那些在他们的母语中不会出现的对比。在生命的早期,当他们学习母语的声音时,他们对非母语的对比失去了敏感性。换句话说,婴儿从“世界公民”变成了特定文化的一员。
A well-established idea in research on speech is that the perceptual sensitivities of an infant become tuned to the native sound system. Recall from section 2.3.4 (subsection on consonants) that infants are initially sensitive to many subtle phonetic contrasts, including those that do not occur in their native language. Early in life they lose sensitivity to nonnative contrasts as they learn the sounds of their native language. In other words, the infant goes from being a “citizen of the world” to being a member of a specific culture.
在音乐发展中是否观察到类似的模式?如果是这样,这可能会建议在这两个领域中进行声音类别学习的通用机制。事实上,一项研究似乎表明了类似的发育过程在音乐中。但是,这项研究存在问题。我在这里回顾这项研究是因为它经常被引用来支持这样一种观点,即音乐发展涉及对非本地音高间隔的敏感性衰减,而事实上这尚未得到明确证明。
Is a similar pattern observed in musical development? If so, this could suggest common mechanisms for sound category learning in the two domains. In fact, there is one study that seems to indicate a similar developmental process in music. However, there are problems with this study. I review the study here because it is often cited as support for the notion that musical development involves a decay in sensitivities to nonnative pitch intervals, when in fact this has yet to be clearly demonstrated.
林奇等人。(1990) 研究了美国婴儿和成人根据文化上熟悉与不熟悉的音阶/音程检测旋律失调的能力。熟悉的音阶是西方的大调音阶和小调音阶,不熟悉的音阶是爪哇的pelog音阶,包含不同的音程。重复播放基于给定音阶的 7 音符旋律,在某些重复中,第 5 音符(旋律中的最高音符)因提高频率而失调。比较婴儿和成人检测这些变化的能力。在一项让人想起关于言语感知的经典著作 (Werker & Tees, 1984) 的发现中,婴儿在熟悉和不熟悉的音阶上表现出同样好的辨别力,而成年非音乐家在熟悉的音阶上表现更好。
Lynch et al. (1990) studied the ability of American infants and adults to detect mistunings in melodies based on culturally familiar versus unfamiliar musical scales/intervals. The familiar scales were the Western major and minor scales, and the unfamiliar scale was the Javanese pelog scale, which contains different intervals. A 7-note melody based on a given scale was played repeatedly, and on some repetitions the 5th note (the highest note in the melody) was mistuned by raising its frequency. Infants and adults were compared on their ability to detect these changes. In a finding reminiscent of classic work on speech perception (Werker & Tees, 1984), infants showed equally good discrimination with the familiar and unfamiliar scale, whereas adult nonmusicians showed better performance on the familiar scales.
然而,林奇等人随后的研究改变了这一局面。Lynch 和 Eilers (1992) 使用他们的测试范例的不同版本,其中失调是随机定位的,而不是总是在同一个音符上,发现 6 个月大的西方婴儿确实在文化熟悉的规模上表现出处理优势, 这一发现在几年后被重复了 (Lynch et al., 1995)。他们推测,这种任务需求的增加可能揭示了西方与爪哇音阶中较简单频率比的先天处理优势。
However, subsequent research by Lynch and others changed the picture. Using a different version of their testing paradigm in which the mistuning was randomly located rather than always on the same note, Lynch and Eilers (1992) found that 6-month-old Western babies did in fact show a processing advantage for the culturally familiar scale, a finding that was replicated a few years later (Lynch et al., 1995). They speculated that this increase in task demands may have uncovered an innate processing advantage for the simpler frequency ratios found in the Western versus Javanese scales.
应该指出的是,Lynch 及其同事的发现受到可能影响其结果的特定方法选择的影响。在上述研究中,重复的背景旋律总是出现在相同的绝对音高水平。这一选择受到了其他研究人员的批评 (Trainor & Trehub, 1992),他们指出,只有当背景旋律被连续移调时,才能确定辨别真的是基于对音程的敏感性,而不是基于绝对频率。
It should be noted that the findings of Lynch and colleagues are tempered by a particular methodological choice that may have influenced their results. In the studies described above, the repeating background melody always occurred at the same absolute pitch level. This choice has been criticized by other researchers (Trainor & Trehub, 1992), who point out that only when the background melody is continuously transposed can one be sure that discrimination is really based on sensitivity to intervals, and not on absolute frequencies.
因此,尽管林奇等人。研究是开创性的,目的是对音乐中的音高类别和语音中的音位类别的发展进行比较研究,但尚未完成对该问题的控制良好的研究。下一节提出了此类研究可能采取的具体方向。
Thus although the Lynch et al. studies were pioneering in aiming to do comparable research on the development of pitch categories in music and phonological categories in speech, well-controlled studies of this issue have yet to be done. The following section suggests a specific direction that such studies might take.
上文第2.4.2和2.4.3节回顾了与音乐和语言共享学习声音类别形成机制的假设(“共享声音类别学习机制假设”或 SSCLMH;参见第 2.4.1 节)一致的证据。已经提到了一种这样的机制在2.4.3 节讨论感知磁效应时。这是统计学习,涉及跟踪环境中的模式并获取其统计属性的隐式知识,而无需任何直接反馈。也就是说,统计学习是由输入中的分布信息驱动的,而不是由显式辅导驱动的。
Sections 2.4.2 and 2.4.3 above reviewed evidence consistent with the hypothesis that music and language share mechanisms for the formation of learned sound categories (the “shared sound category learning mechanism hypothesis,” or SSCLMH; cf. section 2.4.1). One such mechanism was already alluded to in section 2.4.3 when discussing the perceptual magnet effect. This is statistical learning, which involves tracking patterns in the environment and acquiring implicit knowledge of their statistical properties, without any direct feedback. That is, statistical learning is driven by distributional information in the input rather than by explicit tutoring.
统计学习是语音和音乐共享的声音类别学习机制的一个很好的候选者。统计学习已经在语言学习的其他方面得到证明,例如从音节序列中分割单词边界 (Saffran et al., 1996)。此外,实证研究表明,统计学习在音乐感知的各个方面都发挥着作用,从音调等级的创建(参见第 5 章)到旋律期望的塑造(例如,Krumhansl,1990 年,2000 年;Oram & Cuddy, 1995 年;科恩,2000 年;休伦,2006 年)。那么,问题是人们将如何着手探索音乐和语音是否都依赖统计学习来发展声音类别。
Statistical learning is a good candidate for a sound category learning mechanism shared by speech and music. Statistical learning has already been demonstrated for other aspects of language learning, such as segmentation of word boundaries from sequences of syllables (Saffran et al., 1996). Furthermore, empirical work suggests that statistical learning plays a role in various aspects of music perception, ranging from the creation of tonal hierarchies (cf. Chapter 5) to the shaping of melodic expectations (e.g., Krumhansl, 1990, 2000; Oram & Cuddy, 1995; Cohen, 2000; Huron, 2006). The question, then, is how one would go about exploring whether music and speech both rely on statistical learning in the development of sound categories.
比较研究的一个想法是建立在关于语音类别统计学习作用的现有发现的基础上。众所周知,使用母语的经历会影响语音辨别力(参见第 2.3.4 节)). 例如,随着时间的推移,婴儿对他们的语言中不会出现的某些语音对比失去敏感性,而对母语中其他困难的语音对比变得敏感(Polka 等人,2001 年;参见 Kuhl 等人,出版中). 最近,人们对解释这些感知变化的机制越来越感兴趣。梅耶等人。(2002, 2003, in press) 通过对婴儿和成人的言语感知研究解决了这个问题。在这些研究中,两组参与者接触到同一组语音标记的不同分布,然后测试他们从该组中区分特定标记的能力。图 2.25说明了两个这样的分布。
One idea for comparative research is to build on existing findings regarding the role of statistical learning of speech sound categories. It is well established that experience with a native language influences phonetic discrimination (cf. section 2.3.4). For example, over time infants lose sensitivity to certain phonetic contrasts that do not occur in their language, and gain sensitivity for other, difficult phonetic contrasts in their native tongue (Polka et al., 2001; cf. Kuhl et al., in press). Recently there has been growing interest in the mechanisms that account for these perceptual changes. Maye et al. (2002, 2003, in press) have addressed this issue via speech perception studies with infants and adults. In these studies, two sets of participants are exposed to different distributions of the same set of speech tokens, and then tested for their ability to discriminate between specific tokens from this set. Two such distributions are illustrated in Figure 2.25.
该图的 x 轴显示了组织语音刺激的声学连续体,即语音起始时间(VOT,辅音释放和声带振动开始之间的时间)。在这种情况下,刺激在左边缘的预浊化 /da/ 和右端的短滞后 /ta/ 之间形成一个连续体。一组参与者听到遵循双峰分布(虚线)的标记,而另一组听到单峰分布(实线)。
The x-axis of the figure shows the acoustic continuum along which speech stimuli are organized, namely, voice onset time (VOT, the time between the release of a consonant and the onset of vibration of the vocal folds). In this case, the stimuli form a continuum between a prevoiced /da/ on the left edge and a short-lag /ta/ on the right end. One group of participants heard tokens that followed a bimodal distribution (dotted line), whereas the other heard a unimodal distribution (solid line).
此设计的关键方面是在两个分布中存在以相同频率出现的标记(例如,VOT = –50 和 7 ms 的标记,换句话说,标记 3 和 6 从x的左边缘开始计数-轴)。感兴趣的问题是区分这些标记的能力是否取决于它们嵌入的分布。Maye 和 Weiss (2003) 针对 8 个月大的婴儿提出了这个问题。研究人员发现,只有暴露于双峰分布的婴儿表现出区分标记 3 和 6 的证据,而暴露于单峰分布的婴儿(以及未暴露的对照组)没有区分这些标记。重要的是,测试前的曝光量很小,不到 3 分钟。这与其他研究相吻合,表明统计学习是一种非常快速、强大的学习机制(例如,Saffran 等人,1996 年)。
The key aspect of this design is that there are tokens that occur with equal frequency in the two distributions (e.g., the tokens with VOT = –50 and 7 ms, in other words, tokens 3 and 6 counting from the left edge of the x-axis). The question of interest is whether the ability to discriminate these tokens depends on the distribution in which they are embedded. Maye and Weiss (2003) addressed this question with 8-month-old infants. The researchers found that only infants exposed to the bimodal distribution showed evidence of discriminating tokens 3 and 6, whereas infants exposed to the unimodal distribution (and a control group with no exposure) did not discriminate these tokens. Importantly, the amount of exposure prior to test was small, less than 3 minutes. This converges with other research showing that statistical learning is a remarkably rapid, powerful learning mechanism (e.g., Saffran et al., 1996).
图 2.25沿着从 /da/ 到 /ta/ 的声学连续体的刺激分布。一组婴儿听到双峰分布(虚线)的刺激,而另一组婴儿听到单峰分布(实线)的刺激。重要的是,两组听到标记 3 和 6(分别为 -50 毫秒和 7 毫秒 VOT)的频率相同。
Figure 2.25 Distribution of stimuli along an acoustic continuum from /da/ to /ta/. One group of infants heard stimuli from a bimodal distribution (dashed line), whereas another group heard stimuli from a unimodal distribution (solid line). Importantly, both groups heard tokens 3 and 6 (-50 ms and 7 ms VOT, respectively) equally often.
使用音乐材料构建与此类似的实验将是一件简单的事情,例如根据音高步进在大三度和小三度之间变化的音程。(或者,如果想避免文化上熟悉的材料,可以使用西方音乐中不会出现的两个音程。)作为一个实际问题,看看是否有神经措施(例如不匹配的消极性;cf.第2.2.3和2.3.4节,关于大脑反应的小节)可以补充行为技术作为歧视的分析,因为前者最终可能更容易从婴儿身上收集(参见 Trainor 等人,2003 年;McMullen 和 Saffran,2004 年; Kuhl 等人,出版中)。
It would be a straightforward matter to construct an experiment similar to this using musical materials, such as an interval that varied between a major third and a minor third in terms of pitch steps. (Alternatively, one could use two intervals that do not occur in Western music, if one wanted to avoid culturally familiar materials.) As a practical issue, it will be of interest to see whether neural measures (such as the mismatch negativity; cf. sections 2.2.3 and 2.3.4, subsections on brain responses) can complement behavioral techniques as assays of discrimination, because the former may ultimately prove easier to collect from infants (cf. Trainor et al., 2003; McMullen & Saffran, 2004; Kuhl et al., in press).
退后一步,对语音和音乐中声音类别形成的比较研究是值得进行的,因为它们可以确定大脑创建和维持声音类别的一般原则。此外,比较工作可以帮助解决新出现的想法,即声音感知的心理框架不是头脑中的冻结物,而是自适应的并且不断调整自己(Kraljic & Samuel, 2005; Luce & McLennan, 2005; McQueen 等人., 2006). 这种观点与对非母语语音对比的感知存在显着个体差异的观察结果一致(Best 等人,2001 年),以及在音乐声音类别方面存在很大程度的个体差异,即使是受过训练的音乐家(例如,Perlman 和 Krumhansl, 1996 年,以及其中的参考资料)。
Taking a step back, comparative studies of sound category formation in speech and music are worth pursuing because they can identify general principles by which the brain creates and maintains sound categories. Furthermore, comparative work can help address the emerging idea that the mental framework for sound perception is not a frozen thing in the mind, but is adaptive and is constantly tuning itself (Kraljic & Samuel, 2005; Luce & McLennan, 2005; McQueen et al., 2006). This perspective is in line with the observation of substantial individual variation in perception of nonnative speech contrasts (Best et al., 2001), and a large degree of individual variation in musical sound categories, even among trained musicians (e.g., Perlman & Krumhansl, 1996, and references therein).
语言和音乐声音系统说明了音乐-语言关系研究中的一个共同主题。从表面上看,这两个领域截然不同。音乐以语音不使用的方式使用音高,语音组织音色的程度在音乐中很少见。然而,在这些差异之下存在着认知和神经处理方面的深层联系。最值得注意的是,在这两个领域中,大脑都与声音的一个特定方面(音乐中的音高和语音中的音色)相互作用,以创建一个感知离散化的系统。重要的是,这种感知离散化并不是人类听觉感知的自动副产品。例如,语言和音乐序列向耳朵呈现振幅的连续变化,但响度不会根据离散类别来感知。反而,音调和语言音色的感知离散化反映了强大的认知系统的活动,该系统旨在将类别内的声音变化与表明声音类别变化的差异区分开来。尽管音乐和语音在用于声音类别形成的主要声学特征上有所不同,但在这两个领域中创建和维持学习到的声音类别的机制似乎有很大程度的重叠。这种重叠对围绕人类交流发展的实践和理论问题都有影响。尽管音乐和语音在用于声音类别形成的主要声学特征上有所不同,但在这两个领域中创建和维持学习到的声音类别的机制似乎有很大程度的重叠。这种重叠对围绕人类交流发展的实践和理论问题都有影响。尽管音乐和语音在用于声音类别形成的主要声学特征上有所不同,但在这两个领域中创建和维持学习到的声音类别的机制似乎有很大程度的重叠。这种重叠对围绕人类交流发展的实践和理论问题都有影响。
Linguistic and musical sound systems illustrate a common theme in the study of music-language relations. On the surface, the two domains are dramatically different. Music uses pitch in ways that speech does not, and speech organizes timbre to a degree seldom seen in music. Yet beneath these differences lie deep connections in terms of cognitive and neural processing. Most notably, in both domains the mind interacts with one particular aspect of sound (pitch in music, and timbre in speech) to create a perceptually discretized system. Importantly, this perceptual discretization is not an automatic byproduct of human auditory perception. For example, linguistic and musical sequences present the ear with continuous variations in amplitude, yet loudness is not perceived in terms of discrete categories. Instead, the perceptual discretization of musical pitch and linguistic timbre reflects the activity of a powerful cognitive system, built to separate within-category sonic variation from differences that indicate a change in sound category. Although music and speech differ in the primary acoustic feature used for sound category formation, it appears that the mechanisms that create and maintain learned sound categories in the two domains may have a substantial degree of overlap. Such overlap has implications for both practical and theoretical issues surrounding human communicative development.
在 20 世纪,艺术家主要探索口语和音乐声音系统之间的关系。例如,领域之间的边界在诸如勋伯格的Pierro Lunaire和 Reich 的Different Trains等创新作品中发挥了重要作用(参见 Risset,1991)。在 21 世纪,科学终于开始迎头赶上,因为口头和音乐声音系统之间的关系证明它们是认知神经科学研究的一个富有成果的领域。这些工作已经开始对我们物种独特而强大的交流能力产生新的见解。
In the 20th century, relations between spoken and musical sound systems were largely explored by artists. For example, the boundary between the domains played an important role in innovative works such as Schoenberg’s Pierrot Lunaire and Reich’s Different Trains (cf. Risset, 1991). In the 21st century, science is finally beginning to catch up, as relations between spoken and musical sound systems prove themselves to be a fruitful domain for research in cognitive neuroscience. Such work has already begun to yield new insights into our species’ uniquely powerful communicative abilities.
这是第 2.2.2 节的附录。
This is an appendix for section 2.2.2.
音高是声音最显着的感知方面之一,定义为“声音的特性,使其能够按从低到高的音阶排列”(美国声学协会标准声学术语;参见 Randel,1978 年) ). 音高的物理相关性是频率(以每秒循环数或赫兹为单位),在恒定频率的纯音中,音高基本上等于音调的频率。(纯音包含单一频率,如正弦声波。除频率外,持续时间和振幅对纯音的音调起着很小的作用。)
Pitch is one of the most salient perceptual aspects of a sound, defined as “that property of a sound that enables it to be ordered on a scale going from low to high” (Acoustical Society of America Standard Acoustical Terminology; cf. Randel, 1978). The physical correlate of pitch is frequency (in cycles per second, or Hertz), and in a constant-frequency pure tone, the pitch is essentially equal to the frequency of the tone. (A pure tone contains a single frequency, as in a sinusoidal sound wave. Apart from frequency, duration and amplitude play a small role in the pitch of pure tones.)
在由基频和一系列为该基频整数倍的高次谐波构成的周期性声音的情况下(例如如单簧管的声音或人声发出的元音),音高对应于基频。在具有更复杂频谱结构的声音中,音调并不总是声音物理频率结构的简单反映。“基音缺失”现象说明了这一点,在这种现象中,感知到的音高低于声音的任何频率分量。
In the case of periodic sounds made from a fundamental frequency and a series of upper harmonics that are integer multiples of this fundamental (such as the sound of a clarinet or of the human voice uttering a vowel), the pitch corresponds to the fundamental frequency. In sounds with a more complex spectral structure, pitch is not always a simple reflection of the physical frequency structure of the sound. This is illustrated by the phenomenon of the “missing fundamental,” in which the perceived pitch is lower than any of the frequency components of the sound.
当我和一位同事正在对具有三个频率分量的音调的感知进行神经实验时,音高感知的复杂性让我明白了:650、950 和 1250 Hz。我们要求研究参与者通过调整纯音直到它与复合音的音高匹配来识别这种复合音的感知音高。令我们惊讶的是,一些受试者将纯音置于恰好 650 Hz 处,而其他受试者将其置于 325 Hz 附近的低得多的位置(“缺失的基频”)。两组受试者都对自己的判断充满信心,对他们大脑活动的测量显示,听到缺失基音的人和没有听到的人之间存在差异。这说明感知音高不是由声音的物理构成决定的,而是一种感知结构 (Patel & Balaban, 2001)。
The complex nature of pitch perception was driven home to me when a colleague and I were conducting neural experiments on the perception of tones with three frequency components: 650, 950, and 1250 Hz. We asked participants in our study to identify the perceived pitch of this composite tone by adjusting a pure tone until it matched the pitch of the composite. To our surprise, some of the subjects place the pure tone at exactly 650 Hz, whereas others placed it much lower near 325 Hz (a “missing fundamental”). Both groups of subjects were utterly confident of their judgment, and measurements of their brain activity showed differences between those who heard the missing fundamental and those who did not. This illustrates how perceived pitch is not determined by the physical makeup of a sound, but is a perceptual construct (Patel & Balaban, 2001).
音高感知本身已成为研究的一个子领域,并且是心理学家、神经科学家和工程师定期互动的活跃领域(Shamma & Klein,2000;Moore,2001;Plack 等,2005)。
Pitch perception has become a subfield of research in its own right, and is an active area in which psychologists, neuroscientists, and engineers regularly interact (Shamma & Klein, 2000; Moore, 2001; Plack et al., 2005).
这是第 2.2.2节“音阶简介”小节的附录。
This is an appendix for section 2.2.2, subsection “Introduction to Musical Scales.”
半音 (st) 的确切大小可以计算为:
The exact size of a semitone (st) can be computed as:
1 st = 2 (1/12) ≈ 1.0595
1 st = 2(1/12) ≈ 1.0595
在这个等式中,2 表示一个八度音阶的频率加倍,1/12 表示根据音高比将该频率范围划分为 12 个大小相等的步长。
In this equation, the 2 represents the doubling of frequency in one octave, and the 1/12 represents a division of this frequency range into 12 equal-sized steps on the basis of pitch ratios.
因为半音是基于比率的度量,所以任意两个频率F1和F2之间的比率(以 Hz 为单位)可以用半音表示为:
Because the semitone is a ratio-based measure, the ratio between any two frequencies F1 and F2 (in Hz) can be expressed in semitones as:
st = 12 x log 2 ( F1/F2 )。
st = 12 x log2(F1/F2).
要以音分 (c) 计算相同的间隔:
To compute this same interval in cents (c):
c = 1,200 x log 2 ( F1/F2 )
c = 1,200 x log2(F1/F2)
将两个频率F1和F2之间的X个半音的距离转换为频率比 R:
To convert a distance of X semitones between two frequencies F1 and F2 into a frequency ratio R:
R = 2 (X/12)
R = 2(X/12)
这是第 2.2.2节“音阶简介”小节的附录。
This is an appendix for section 2.2.2, subsection “Introduction to Musical Scales.”
为了了解为特定音高设定特殊感知状态的理论,我们将主要关注一个音程:音乐五度音程。在八度音程之后,五度音程是西方音乐中最重要的音程,也是范围广泛的其他音乐传统中的重要音程,从印度和中国的大型音乐文化到群岛小部落的音乐大洋洲(Jairazbhoy,1995 年;Koon,1979 年;Zemp,1981 年)。下面给出了对第五位特殊地位的一些解释。
In order to get a flavor for theories that posit a special perceptual status for certain pitch intervals, we will focus largely on one interval: the musical fifth. After the octave, the fifth is the most important interval in Western music, and also serves as an important interval in a wide range of other musical traditions, ranging from large musical cultures in India and China to the music of small tribes in the islands of Oceania (Jairazbhoy, 1995; Koon, 1979, Zemp, 1981). Some of the explanations for the special status of the fifth are given below.
某些音程(如五度)的特殊状态的一个想法可以追溯到 Hermann von Helmholtz (1885),并且与声音的不同频率分量的相互作用有关。人们早就知道,当同时发出两个频率接近的纯音时(例如,A3 和A♯3, or 220 Hz and 233.08 Hz), 由于两个声波的物理干扰,结果是具有振幅波动(或“跳动”)的复合音调。当频率分开时,这种物理干扰开始减弱,但由于这两种音调没有被听觉系统完全分辨,因此仍然会有粗糙感。也就是说,在听觉系统内,这两个音调位于“临界带”内,刺激基底膜的相似部分(该膜有助于将声音分解为频率成分,作为向大脑发送听觉信号的一部分)。一旦音调不再位于同一临界频带(大约 3 个半音)内,粗糙感就会消失。
One idea for the special status of certain intervals (such as the fifth) dates back to Hermann von Helmholtz (1885), and is related to the interaction of different frequency components of a sound. It has long been known that when two pure tones close in frequency are sounded simultaneously (for example, A3 and A♯3, or 220 Hz and 233.08 Hz), the result is a composite tone with amplitude fluctuation (or “beating”) due to the physical interference of the two sound waves. As the frequencies are moved apart, this physical interference begins to subside, but there is still a sensation of roughness due to the fact that the two tones are not fully resolved by the auditory system. That is, within the auditory system the two tones lie within a “critical band,” exciting similar portions of the basilar membrane (the membrane that helps resolve sounds into their frequency components as part of sending auditory signals to the brain). Once the tones are no longer within the same critical band (approximately 3 semitones), the sensation of roughness disappears.
概括这个想法,如果一个人同时演奏两个复杂的音调,每个音调都包含一个基音及其泛音,那么靠近音调 2 的泛音的音调 1 的泛音将与它们相互作用,从而产生粗糙度。音程的感官协和/不协和理论预测,两个同时出现的复杂音调的整体不协和等于每对相互作用的谐波产生的粗糙度之和。根据这一理论,八度音程是最大辅音,因为高音的基音和所有泛音与低音的泛音完全一致(图 A.1),而第五个音程是下一个最辅音的音程,因为高音要么与低音的谐波对齐,要么落在它们之间足够远以避免相互作用(图 A.2)。
Generalizing this idea, if one plays two complex tones simultaneously, each of which consists of a fundamental and its harmonics, the harmonics of Tone 1 that lie close to the harmonics of Tone 2 will interact with them to create roughness. The sensory consonance/dissonance theory of musical intervals predicts that the overall dissonance of two simultaneous complex tones is equal to the sum of the roughness produced by each pair of interacting harmonics. According to this theory, the octave is maximally consonant because the fundamental and all the harmonics of the upper tone line up exactly with harmonics of the lower (Figure A.1), and the fifth is the next most consonant interval because the frequency components of the upper tone either line up with the harmonics of the lower or fall far enough between them to avoid interaction (Figure A.2).
图 A.1基频相隔一个八度的两个泛音的分频对齐:A3(左列,基频为 220 Hz)和A♯(右列,基频为 440 Hz)。
Figure A.1 Alignment of frequency partials of two harmonic tones whose fundamentals are one octave apart: A3 (left column, fundamental frequency of 220 Hz) and A♯ (right column, fundamental frequency of 440 Hz).
相比之下,小二度音(1 个半音的音程)是高度不协和的,因为两个音调的几个谐波在频率上彼此足够接近以产生粗糙度(图 A.3)。
In contrast, the minor second (an interval of 1 semitone) is highly dissonant because several of the harmonics of the two tones lie close enough to each other in frequency to create roughness (Figure A.3).
该理论的一个吸引人的方面是它可以提供复杂谐波之间不同音程的感知协和的定量排名,并且可以根据听众对这些相同音程的协和或不协和进行评分的感知数据来测试这一预测(Plomp & Levelt , 1965;Kameoka & Kuriyagawa, 1969a, b)。例如,该理论预测五音和四音为高协和,小二度音(一个半音音程)和大七音(一个半音音程)为低协音/高不协和音,这些预测与直觉和实验非常吻合。
One appealing aspect of this theory is that it can provide a quantitative ranking of perceived consonance of different musical intervals between complex harmonic tones, and this prediction can be tested against perceptual data in which listeners rate consonance or dissonance of these same intervals (Plomp & Levelt, 1965; Kameoka & Kuriyagawa, 1969a, b). For example, the theory predicts high consonance for the fifth and fourth and low consonance/high dissonance for the minor second (a one semitone interval) and major seventh (an 11 semitone interval), predictions that accord well with intuition and experiment.
应当指出,对感觉和谐与不和谐的研究有其不足之处。一个这样的缺点是,在判断音调对时交给听众的任务经常混淆刺激的审美和感官品质。参与者经常被要求判断给定音程“多么令人愉悦”或“令人满意”,假设音程更辅音更令人愉悦。事实上,人们偏爱协和或不协和音程的程度可能存在个体差异。例如,众所周知,在某些文化中,诸如二度这样听起来粗糙的音程被认为是非常悦耳的,例如保加利亚的某些类型的复音声乐。据推测,保加利亚听众认为这些小间隔很粗糙,但他们发现这很有吸引力而不是没有吸引力(参见 Vassilakis,2005)。因此,在对听众的指导中,研究人员需要清楚地区分粗糙度或光滑度的感官属性与对这些品质的审美判断,否则可能会混淆个体反应中两种不同的可变性来源。(可以类比食物中的味道感知。有些文化觉得辛辣食物令人反感,而另一些文化则觉得它有吸引力。因此,在对辛辣食物进行味觉测试时,需要明确是否要求品尝者评价他们觉得食物有多“令人愉快”与他们觉得食物有多“辣”。两种文化可能会同意哪些食物是辣的,但可能不同意这是否有吸引力。)这项研究的另一个弱点是预测该模型应该是独立于文化的,仅针对在广泛听到西欧音乐的环境中(德国、日本和美国)收集的数据进行了测试。因此,需要在各种文化背景下以标准化方式收集的更新数据来充分测试模型。
It should be noted that research on sensory consonance and dissonance has had its shortcomings. One such shortcoming is that the task given to listeners in judging tone pairs often confounds aesthetic and sensory qualities of a stimulus. Participants are often asked to judge “how pleasing” or “agreeable” a given interval is, with the assumption that intervals that are more consonant are more pleasing. In fact, there may be individual differences in the degree to which people prefer consonant or dissonant intervals. It is known, for example, that there are cultures in which rough-sounding intervals such as the second are considered highly pleasing, as in certain types of polyphonic vocal music in Bulgaria. Presumably Bulgarian listeners hear these small intervals as rough, but they find this attractive rather than unattractive (cf. Vassilakis, 2005). Thus in their instructions to listeners, researchers need to clearly distinguish sensory attributes of roughness or smoothness from aesthetic judgment about these qualities, or else risk confounding two different sources of variability in individual responses. (An analogy can be made to the perception of flavor in food. Some cultures find spicy food distasteful, whereas others find it attractive. Thus in a taste test with spicy foods, one needs to be clear about whether one is asking the taster to rate how “pleasant” they find the food versus how “spicy” they find it. Both cultures will likely agree on which foods are spicy, but may disagree on whether this is appealing or not.) Another weakness of this research is that the predictions of the model, which should be culture-independent, have been tested only against data collected in contexts in which Western European music is widely heard (Germany, Japan, and the United States). Thus newer data, collected in a standardized way in variety of cultural contexts, is needed to test the model adequately.
图 A.2基音相隔五分之一(七个半音)的两个泛音的分频对齐:A3(左列,基频为 220 Hz)和 E4(右列,基频为 ~ 330 Hz)。
Figure A.2 Alignment of frequency partials of two harmonic tones whose fundamentals are a fifth (seven semitones) apart: A3 (left column, fundamental frequency of 220 Hz) and E4 (right column, fundamental frequency of ~ 330 Hz).
图 A.3两个谐波音调的分频对齐,其基频相隔小二秒(一个半音):A3(左列,基频为 220 Hz)和 AI3(右列,基频为 ~ 233 Hz)。
Figure A.3 Alignment of frequency partials of two harmonic tones whose fundamentals are a minor second (one semitone) apart: A3 (left column, fundamental frequency of 220 Hz) and AI3 (right column, fundamental frequency of ~ 233 Hz).
在上述音程的感觉协和/不协和理论中,五度的特殊地位是它的知觉平滑度,仅次于八度。但什么是平滑度?缺乏粗糙度。因此,该理论最终是用否定的术语来构建的。五音流行的另一种想法是基于这样一种想法,即这个音程具有使其具有吸引力的积极品质。沿着这些思路最近的一个理论侧重于听觉神经中的神经活动模式,特别是不同音调组合产生的神经冲动的时间结构。特拉莫等人。(2003) 提供的数据表明,第五音产生了一种神经模式,它不仅调用音程的上下音符的音高,而且调用低于低音音符一个八度的音高,和其他和声相关的音符。相比之下,不和谐音程(例如小二度)的神经模式并不表明有任何清晰的音调(参见 Cariani,2004)。
In the sensory consonance/dissonance theory of intervals described above, the special status of the fifth is its perceptual smoothness, which is second only to the octave. But what is smoothness? A lack of roughness. Thus the theory is ultimately framed in negative terms. An alternative idea for the prevalence of the fifth is based on the idea that this interval has a positive quality that makes it attractive. One recent theory along these lines focuses on patterns of neural activity in the auditory nerve, and in particular the temporal structure of neural impulses that result from different combinations of tones. Tramo et al. (2003) have presented data suggesting that the fifth generates a neural pattern that invokes not only the pitches of the lower and upper notes of the interval, but also a pitch one octave below the lower note, and other harmonically related notes. In contrast, the neural pattern to dissonant intervals such as the minor second does not suggest any clear pitches (cf. Cariani, 2004).
人们可以将其称为音程的“音高关系理论”(参见 Parncutt,1989)。根据这个理论,第五音很特别,因为它投射出多个相关音高的清晰感觉,包括位于较低音符下方一个八度的“缺失的基础音”。33当然,要将其与感觉协调/不协调理论区分开来,有必要考察这两种理论做出不同预测的其他区间。迄今为止,还没有进行过这样的直接比较。使用具有感知上不同品质的三个同时音调(三元和弦)的感知,可以进行直接比较。例如,大三和弦(例如,CEG,以 4 和 3 st 为间隔的三个音调)被认为比增三和弦(例如,CEG ♯,以 4 和 4 st 为间隔的三个音调)更辅音和稳定. 这种差异不是基于相互作用的频率部分预测的(Parncutt,1989 年;Cook,2002 年),并且可以通过音高关系理论更好地解释(参见 Cook 和 Fujisawa,2006 年)。
One might call this the “pitch-relationship theory” of musical intervals (cf. Parncutt, 1989). According to this theory, the fifth is special because it projects a clear sense of multiple, related pitches, including the “missing fundamental” that lies an octave below the lower note.33 Of course, to distinguish it from the sensory consonance/dissonance theory, it is necessary to examine other intervals in which the two theories make different predictions. To date, no such direct comparison has been made. A direct comparison might be possible using the perception of three simultaneous tones (triadic chords) that have perceptually distinct qualities. For example, the major triad (e.g., C-E-G, three tones separated by intervals of 4 and 3 st) is considered far more consonant and stable than the augmented triad (e.g., C-E-G♯, three tones separated by intervals of 4 and 4 st). This difference is not predicted on the basis of interacting frequency partials (Parncutt, 1989; Cook, 2002), and may be better accounted for by the pitch-relationship theory (cf. Cook & Fujisawa, 2006).
许多复杂的周期性声音,包括振动弦和人声带发出的声音,都具有谐波结构,由一组离散的频率(“分音”)组成,其中一个最低频率(基频)伴随着一组高频率是该基频整数倍的分音(基频的谐波)。考虑一个基频为 100 Hz 的谐波声音。谐波将为 200 Hz、300 Hz、400 Hz 等。取连续频率之间的频率比可得出 1:2、2:3、3:4 等。因此,八度、五度、四度等的音程存在于听觉系统遇到的每一个和声中。
Many complex periodic sounds, including the sounds made by vibrating strings and by the human vocal folds, have a harmonic structure, consisting of a discrete set of frequencies (“partials”) with one lowest frequency (the fundamental) accompanied by a set of upper partials that are integer multiples of this fundamental frequency (harmonics of the fundamental). Consider a harmonic sound with a fundamental frequency of 100 Hz. The harmonics will be 200 Hz, 300 Hz, 400 Hz, and so on. Taking the frequency ratios between successive frequencies gives 1:2, 2:3, 3:4, and so on. Thus the intervals of the octave, fifth, fourth, and so on are in every harmonic sound encountered by the auditory system.
这种观察经常被用来声称构成西方大调音阶的音程的自然基础,隐含的假设是音程在和声系列中出现的顺序与音调组合与这些频率比的感知和谐有关. 随着时间的推移,不同的人强调不同的谐波来源,这一论点的确切形式也有所不同。有些人专注于振动弦 (Bernstein, 1976),暗示音程是自然的事实,而其他人则专注于声音 (Terhardt, 1984),暗示对这些音程的欣赏是通过接触语音的谐波结构。在“言语论证”的最新变体中(Schwartz 等人,2003 年),♯ 1),但部分♯ 2 或♯ 4。像所有偶数分音一样,这两个分音的上分音是其频率的 3/2(例如,分音♯ 6 始终是分音♯频率的 3/2 )。因此,研究人员能够凭经验表明,语音频谱中具有最强能量峰值的频率通常伴随着该峰值之上五分之一间隔的另一个能量集中。作者随后争辩说,五度音在音乐中具有特殊地位,正是因为它在统计意义上“类似于演讲”,因此非常熟悉。然后,他们将类似的论点应用于西方音阶的其他音程。
This observation has often been used to claim a natural basis for the intervals that constitute the Western major scale, with the implicit assumption that the order in which the intervals occurs in the harmonic series is related to the perceived consonance of tone combinations with those frequency ratios. The exact form that this argument has taken has varied over time, with different people emphasizing different sources of harmonic sounds. Some have focused on vibrating strings (Bernstein, 1976), with the implication that musical intervals are a fact of nature, whereas others have focused on the voice (Terhardt, 1984), with the implication that appreciation of these intervals arises through exposure to the harmonic structure of speech. In one recent variant of the “argument from speech” (Schwartz et al., 2003), researchers have studied the frequency structure of running speech and found that the frequency partial with the most energy in the voice is often not the fundamental (frequency partial ♯1), but partial ♯2 or ♯4. Like all even-numbered partials, these two partials have an upper partial that is 3/2 their frequency (for example, partial ♯6 is always 3/2 the frequency of partial ♯). Thus the researchers were able to show empirically that the frequency with the strongest energy peak in the speech spectrum is often accompanied by another concentration of energy at an interval of a fifth above this peak. The authors then argue that the fifth has a special status in music precisely because it is “speech-like” in a statistical sense, and hence highly familiar. They then apply similar arguments to other intervals of the Western musical scale.
前面三个部分概述了解释为什么一个特定音程(第五音程)在音阶中很重要的不同方法。这三种方法都具有普遍主义的味道,这是合理的,因为这个音程出现在许多不同文化的音阶中,这表明人类听觉系统的某些方面使人们倾向于选择这个音程来在他们的音乐声音系统中建立音高关系。哪种理论最能解释第五种流行是一个悬而未决的问题。事实上,这些理论并不相互排斥,上述所有力量都可能发挥作用。
The three preceding sections have outlined different approaches to explaining why one particular interval (the fifth) is important in musical scales. All three approaches have a universalist flavor, which is reasonable because this interval appears in the musical scales of many diverse cultures, suggesting that something about the human auditory system biases people toward choosing this interval in building pitch relations in their musical sound systems. Which theory best explains the prevalence of the fifth is an open question. Indeed, the theories are not mutually exclusive, and all the forces described above may play a role.
虽然有充分的理由相信五度有听觉感知的基础,但西方大调音阶的其他音程能否在这个普遍主义框架中得到解释?由于音阶系统的可变性,那些在非西方文化中研究音乐的人有理由怀疑强大的普遍主义力量。此外,上述方法在解释西方音阶方面的局限性已被注意到(例如,参见 Lerdahl & Jackendoff,1983:290,对泛音理论的批评)。要确定这些理论中的哪一个在跨文化框架中最有价值,将需要使用一致方法收集的关于间隔感知的跨文化数据。此外,这些理论需要对区间感知做出足够不同的定量预测,
Although there is good reason to believe that the fifth has a basis in auditory perception, can the other intervals of the Western major scale be explained in this universalist framework? Those who study music in non-Western cultures are justifiably skeptical of strong universalist forces due to the variability in scale systems. Furthermore, limitations of the above approaches in explaining even the Western scale have been noted (see, for example, Lerdahl & Jackendoff, 1983:290, for a critique of the overtone theory). To determine which of these theories has the most merit in a cross-cultural framework will require cross-cultural data on interval perception, collected using consistent methods. Also, the theories will need to make sufficiently different quantitative predictions about interval perception, or else no amount of empirical research will be able to distinguish between them.
未来对音程的研究可以采取哪些方向?对于那些对某些音程的普遍听觉倾向感兴趣的人来说,一个方向是测试不熟悉某些音程的人,看看他们是否仍然对这些音程表现出一些特殊的感知反应。这一直是研究婴儿音程感知的动机。例如,Schellenberg 和 Trehub (1996) 测试了 6 个月大的加拿大婴儿检测重复旋律音程(即两个连续出现的声音之间形成的音程)大小变化的能力。他们发现当标准音程为五度或四度而异常音程为三全音时检测效果更好,反之亦然,并表明这证明了前一个音程的先天偏差,这是基于小的整数比率。不幸的是,不能排除之前的音乐接触对这一结果的影响,因为在 6 个月大之前,婴儿会接触大量的音乐,尤其是摇篮曲 (Unyk et al., 1992)。在西方旋律中,五度音程或四度音程比三全音音程更常见 (Vos & Troost, 1989),这意味着前一个音程的处理优势可以简单地反映出之前对这些音程的更多接触。因此,如果婴儿来自没有环境音乐的家庭,结果会更强。34或者,Hauser 和 McDermott (2003) 提出,解决间隔感知中先天偏见的一种方法是测试非人类灵长类动物,这些动物接触音乐和言语受到严格控制。如果这些灵长类动物显示出对音乐五度的处理优势,尽管没有接触过任何音乐或语言,这将是灵长类动物听觉神经系统天生喜欢某些音程的有力证据。35
What directions can future research on musical intervals take? For those interested in the idea of universal auditory predispositions for certain musical intervals, one direction involves testing individuals unfamiliar with certain intervals to see whether they nevertheless show some special perceptual response to them. This has been the motivation behind research on musical interval perception in infants. For example, Schellenberg and Trehub (1996) tested the ability of 6-month-old Canadian infants to detect a change in the size of a repeating melodic interval (i.e., an interval formed between two sequentially presented sounds). They found that detection was better when the standard interval was a fifth or fourth and the deviant interval was a tritone than vice versa, and suggested that this demonstrated an innate bias for the former intervals, which are based on small whole-number ratios. Unfortunately, one cannot rule out the influence of prior musical exposure on this result, because before 6 months of age infants are exposed to a good deal of music, especially lullabies (Unyk et al., 1992). In Western melodies, intervals of a fifth or fourth are more common than of a tritone (Vos & Troost, 1989), meaning that the processing advantage of the former intervals could simply reflect greater prior exposure to these intervals. Thus the result would be much stronger if infants came from households in which there was no ambient music.34 Alternatively, Hauser and McDermott (2003) have suggested that one way to address innate biases in interval perception is to test nonhuman primates whose exposure to music and speech has been tightly controlled. If such primates showed a processing advantage for the musical fifth, despite lack of any exposure to music or speech, this would be strong evidence that certain intervals are innately favored by the primate auditory nervous system.35
这是第 2.3.2 节的附录,“仔细观察声调语言中水平声调之间的音高对比”小节。
This is an appendix for section 2.3.2, subsection “A Closer Look at Pitch Contrasts Between Level Tones in Tone Languages.”
用数学术语来说:
In mathematical terms:
T = 100 × (F – L)/R
T = 100 × (F – L)/R
其中T =音调的范围归一化百分比,F =音调频率,L =个人说话范围的底部,R =说话范围。F、L和R以赫兹表示。
Where T = the tone’s range-normalized scaling in percent, F = the tone frequency, L = the bottom of the individuals’ speaking range, and R = the speaking range. F, L, and R are expressed in Hz.
1确实,一个人不必离开自己的文化就能获得这种体验。欣赏用微分音阶谱曲的西欧音乐(例如查尔斯·艾夫斯或伊斯利·布莱克伍德的某些音乐)的困难之一是,它听起来像是用熟悉的音阶系统谱写的走调版本的音乐。对于那些不熟悉音阶概念的人,请参阅第 2.2.2 节的“音阶简介”小节。
1 Indeed, one need not leave one’s culture to have this experience. One of the difficulties of appreciating Western European music composed with microtonal scales, such as certain music by Charles Ives or Easley Blackwood, is that it can sound like an out-of-tune version of music composed using familiar scale systems. For those unfamiliar with the concept of a musical scale, see section 2.2.2, subsection “Introduction to Musical Scales.”
2音高差异通常用高度来比喻,但从跨文化的角度来看,这并不是音高的唯一比喻。例如,Havasupai 美洲原住民(加利福尼亚州)使用“硬”和“软”而不是“高”和“低”(Hinton,1984:92),而利比里亚的 Kpelle 使用“小”和“大”(Stone ,1982:65;参见赫尔佐格,1945:230-231)。
2 Pitch differences are commonly described using metaphor of height, but a cross-cultural perspective reveals that this is not the only metaphor for pitch. For example, instead of “high” and “low” the Havasupai native Americans (California) use “hard” and “soft” (Hinton, 1984:92), whereas the Kpelle of Liberia use “small” and “large” (Stone, 1982:65; cf. Herzog, 1945:230-231).
3音色感知也是多维的。它很少作为有组织的声音类别系统的基础的原因将在后面的 2.2.4 节中讨论。
3 Timbre perception is also multidimensional. Reasons why it seldom serves as the basis for an organized system of sound categories are discussed later, in section 2.2.4.
4需要注意的是,随着频率的增加,听者在进行八度音程判断时更喜欢略大于2:1的频率比。这种“八度音阶拉伸”可能具有神经生理学基础(McKinney & Delgutte,1999)。
4 It should be noted that as frequency increases, listeners prefer a frequency ratio slightly greater than 2:1 in making octave judgments. This “octave stretch” may have a neuro-physiological basis (McKinney & Delgutte, 1999).
5请注意,由一个八度音程(音高比为 2:1)分隔的音高的相似性为在对数基础上形成音高区间奠定了基础,换句话说,基于频率比而不是频率差(Dowling,1978 ). 对于以线性而非对数为基础组织的音乐音高关系的可能案例,请参见 Haeberli (1979) 对古代秘鲁排箫的讨论以及 Will 和 Ellis (1994, 1996) 对澳大利亚土著歌唱的分析。
5 Note that the similarity of pitches separated by an octave (a 2:1 ratio in pitch) lays the foundation for forming pitch intervals on a logarithmic basis, in other words, on the basis of frequency ratios rather than frequency differences (Dowling, 1978). For possible cases of musical pitch relations organized on a linear rather than a logarithmic basis, see Haeberli’s (1979) discussion of ancient Peruvian panpipes and Will and Ellis’s (1994, 1996) analysis of Australian aboriginal singing.
6请注意,7、5 和 4 个半音的音程不会分别产生 3:2、4:3 和 5:4 的精确频率比,因为等律音阶是试图产生这种音程之间的折衷比率和在不同音乐键之间轻松移动的愿望 (Sethares, 1999)。
6 Note that intervals of 7, 5, and 4 semitones do not yield the exact frequency ratios of 3:2, 4:3, and 5:4, respectively, because the equal-tempered scale is a compromise between an attempt to produce such ratios and the desire to move easily between different musical keys (Sethares, 1999).
7美国作曲家哈里·帕奇 (Harry Partch, 1901-1974) 以通过更精细的细分制作音阶,以及制造用于演奏这种音乐的优美乐器而闻名。在他生命的尽头,帕奇使用基于 43 倍八度音阶的音阶进行创作(布莱克本,1997 年;帕奇,1974 年)。反传统的 Partch 曾经挖苦地将西方键盘称为“音乐自由面前的 12 个黑白条”(Partch,1991:12)。
7 The American composer Harry Partch (1901-1974) is well known for making scales from even finer subdivisions, and for building beautiful instruments with which to play this music. Toward the end of his life, Partch composed using scales based on a 43-fold division of the octave (Blackburn, 1997; Partch, 1974). The iconoclastic Partch once wryly referred to Western keyboards as “12 black and white bars in front of musical freedom” (Partch, 1991:12).
8有人进一步声称,当尺度具有不同大小的间隔时,它们往往会聚集成两个不同的大小类别,尽管我不知道关于这个问题的任何经验证据。
8 It has been further claimed that when scales have intervals of different sizes, they tend to clump into two distinct size classes, though I am not aware of any empirical evidence on this issue.
9德彪西对对称音阶的使用是认知音乐学中一个有趣的案例研究。众所周知,德彪西在 1889 年的巴黎博览会上听到了爪哇音乐。在他的例子中,全音音阶的吸引力可能在于它们的音调模糊性,这符合他自己使用抵制清晰音调的变化和声调色板的倾向中心。
9 Debussy’s use of a symmetric scale is an interesting case study in cognitive musicology. It is known that Debussy heard Javanese music at the Paris Exposition of 1889. In his case, the appeal of the whole-tone scales may have been their tonal ambiguity, which fit with his own tendency to use a shifting harmonic palette that resisted clear tonal centers.
10音乐 CP 研究的方法很简单:音程的大小在两个端点之间以小增量变化。例如,音程范围可能从小三度到纯四度(3 到 5 个半音),步长为 12.5 音分。以随机顺序向听众呈现间隔,并要求从一组N个标签中给每个人一个标签,其中N =调查中的音程类别数(上例中为 3:小三度、大三度和纯四度)。然后,呈现大小相差固定数量的成对区间,以进行相同-不同的区分。然后检查由此产生的识别和辨别功能以获得 CP 的证据。(在评估此类研究时,重要的是要辨别作者是否使用了首音允许变化的音程,或者首音是否始终保持在固定频率。只有前一种方法强制使用抽象音程类别并防止音高匹配策略,因此是对 CP 的有力考验。)
10 The methodology of musical CP studies is simple: The size of an interval is varied in small increments between two endpoints. For example, intervals might range from a minor third to a perfect fourth (3 to 5 semitones) in steps of 12.5 cents. Listeners are presented with intervals in a random order and asked to give each one a label from a set of N labels, in which N = the number of interval categories under investigation (3 in the above example: minor third, major third, and perfect fourth). Then, pairs of intervals that differ by a fixed amount in size are presented for same-different discrimination. The resulting identification and discrimination functions are then examined for evidence of CP. (In assessing such studies, it is important to discern whether the authors used intervals whose first tones were allowed to vary, or if the first tone was always kept at a fixed frequency. Only the former method forces the use of abstract interval categories and prevents pitch-matching strategies, and is thus a strong test of CP.)
11进行这项研究的另一种方法是通过改变听众的文化背景来获得单一偏差并改变类别边界的存在。因此,对某些听众而言,越轨行为会进入一个新类别,但对其他听众而言则不然。这种方法已成功地用于使用 MMN 研究母语元音类别(Näätänen 等人,1997)。
11 Another way to do this study is to have a single deviant and vary the presence of a category boundary by varying the cultural background of the listeners. Thus the deviant would cross into a new category for some listeners but not for others. This approach has been used successfully in the study of native language vowel categories using the MMN (Näätänen et al., 1997).
12丰富合奏的音色多样性一直是现代音乐关注的问题。在提倡在音乐中使用噪音的早期现代主义宣言中,Luigi Russolo(1913/1986)猛烈抨击传统管弦乐队有限的音色调色板,称音乐厅是“贫血声音的医院”。
12 Enriching the timbral diversity of ensembles has long been a concern of modern music. In an early modernist manifesto promoting the use of noise in music, Luigi Russolo (1913/1986) inveighed against the traditional orchestra’s limited timbral palette, calling the concert hall a “hospital for anemic sounds.”
13当然,语音学和音韵学之间的界限并不总是很明确,而且越来越多的研究将这两种方法结合起来(例如,请参阅 LabPhon 系列会议和书籍)。仅举一个例子,语音节奏的研究长期以来一直是音韵学的领域,音韵学根据音节和重音的模式将语言分为不同的节奏类。然而,正如第 3 章所讨论的,最近的语音节奏研究方法结合了语音学的见解(根据语音节奏的潜在因素)和语音的语音测量。
13 Of course, the line between phonetics and phonology is not always clear cut, and there are a growing number of studies that combine these two approaches (e.g., see the LabPhon series of conferences and books). To take just one example, the study of speech rhythm was long the province of phonology, which divided languages into distinct rhythmic classes based on the patterning of syllables and stress. However, recent approaches to speech rhythm have combined the insights of phonology (in terms of the factors underlying speech rhythm) with phonetic measurements of speech, as discussed in Chapter 3.
14加州大学洛杉矶分校语音实验室提供了 IPA 符号及其相应发音的网络资源:http: //hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course/chapter1/chapter1.html。
14 A Web-based resource for IPA symbols and their corresponding sounds is provided by the UCLA phonetics laboratory: http://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course/chapter1/chapter1.html.
15音素特征的分析也可以基于声音的声学,从而产生“听觉区别特征”的概念(Jakobson、Fant 和 Halle,1952/1961)。例如,“grave”这个特征指的是语音中以低频能量为主的声音,无论它们是如何产生的。因此,它将声音独立于发音方式组合在一起。Hyman (1973) 举例说明了此功能如何影响语音的模式。
15 Analysis of phoneme features can also be based on the acoustics of sounds, leading to the concept of “auditory distinctive features” (Jakobson, Fant, & Halle, 1952/1961). For example, the feature “grave” refers to speech sounds with predominantly low frequency energy, no matter how they are produced. Thus it groups together sounds independently of their mode of articulation. Hyman (1973) gives examples of how this feature affects the patterning of speech sounds.
16虽然音节是一个直观的语言学概念,但语言学家对其精确定义存在争议。我在这里使用了一个简化的定义,基于一个音节必须包含一个元音的想法。事实上,一个人可以有“音节辅音”,例如“button”的 /n/ 发音为“butn”时,以及核心是浊辅音而不是元音的音节,例如 /tkznt/柏柏尔语 (Cole-man, 1999)。就本书而言,这些细节并不重要。什么是值得注意的是,虽然通常认为音素是语音感知的基本单位,但有许多语音科学家认为这个角色实际上是由音节扮演的,不幸的是,这个争论超出了本书的范围(格林伯格,2006 年)。
16 Although the syllable is an intuitive linguistic concept, there is debate among linguists over its precise definition. I have used a simplified definition here, based on the idea that a syllable must contain a vowel. In fact, one can have “syllabic consonants,” such as the /n/ of “button” when pronounced as “butn,” and syllables in which the nucleus is a voiced consonant rather than a vowel, for example, /tkznt/ in the Berber language (Cole-man, 1999). For the purposes of this book, these subtleties are not important. What is worth noting is that although it is normally assumed that the phoneme is the elementary unit of speech perception, there are a number of speech scientists who argue that this role is actually played by the syllable, a debate that is unfortunately beyond the scope of this book (Greenberg, 2006).
17这一估计被语言学家普遍接受,其依据是来自Ethnologue:世界语言(参见http://www.ethnologue.org)的信息。
17 This estimate, which is generally accepted by linguists, is based on information from Ethnologue: Languages of the World (see http://www.ethnologue.org).
18我很感谢 Neelima K. Pandit 于 2000 年组织了一次实地考察,在尼日利亚录制了这张唱片。
18 I am grateful to Neelima K. Pandit for organizing a field trip to make this recording in Nigeria in 2000.
19独立于音调语言和音乐绝对音高之间的假定联系,学习音调语言可能更普遍地塑造非语言音高的感知。例如,Bent、Bradlow 和 Wright (2006) 表明,说普通话和说英语的人在非语音音调轮廓识别任务中存在差异。说普通话的人倾向于将低平坦轮廓判断为音高下降,而将高平坦轮廓判断为音高上升,这可能反映了与普通话语言声调准确分类相关的听力策略。有趣的是,普通话和英语听众在轮廓辨别任务上没有表现出差异,这表明语言对非语言音调感知的影响进入了更抽象的分类阶段(相对于原始辨别)。
19 Independent of the putative link between tone languages and musical absolute pitch, it may be that learning a tone language shapes the perception of nonlinguistic pitch more generally. For example, Bent, Bradlow, and Wright (2006) showed that speakers of Mandarin and English differ in a nonspeech pitch contour identification task. Mandarin speakers showed a tendency to judge low flat contours as falling in pitch, and high flat contours as rising in pitch, which may reflect listening strategies associated with accurate categorization of linguistic tones in Mandarin. Interestingly, the Mandarin and English listeners showed no difference in contour discrimination tasks, suggesting that the influence of language on nonlinguistic pitch perception entered at the more abstract stage of categorization (vs. raw discrimination).
20组织会说话的鼓声信息的一个关键原则是冗余。因为许多词具有相同的声调模式,所以有必要通过将它们放在更长的短语中来消除这些词的歧义。这些往往是刻板的、诗意的短语,构成部落口头传统的一部分。例如,“moon”这个词可能会嵌入到较长的短语“the moon looks at the earth”中。Carrington (1971) 估计普通语言的每个音节大约需要八个鼓语音节才能传达明确的信息。
20 A key principle in the organization of talking drum messages is redundancy. Because many words have the same tonal pattern, it is necessary to disambiguate the words by placing them in longer phrases. These tend to be stereotyped, poetic phrases that form part of the oral tradition of a tribe. For example, the word “moon” may be embedded in the longer phrase “the moon looks down at the earth.” Carrington (1971) estimated that approximately eight syllables of drum language were needed for each syllable of ordinary language in order to convey unambiguous messages.
21 “清晰的”和“有声调的”口哨讲话之间存在基本区别。在前者中,元音和辅音有吹口哨的对应物,元音由相对音高的差异表示,辅音由音高过渡和振幅包络提示表示(Busnel & Classe, 1976; Thierry, 2002; Rialland, 2003, 2005; Meyer, 2004) ). 这允许几乎任何可以说出的消息的通信。研究得最好的案例发生在加那利群岛,尽管其他地区也有口哨讲话,包括法属比利牛斯山脉和土耳其。
21 There is a basic distinction between “articulated” and “tonal” whistled speech. In the former, vowels and consonants have whistled counterparts, with vowels being indicated by differences in relative pitch, and consonants by pitch transitions and amplitude envelope cues (Busnel & Classe, 1976; Thierry, 2002; Rialland, 2003, 2005; Meyer, 2004). This allows the communication of virtually any message that can be spoken. The best-studied case occurs in the Canary Islands, though articulated whistled speech is also known from other regions, including the French Pyrenees and Turkey.
22事实上,当从普通话句子中删除所有 F0 变体时,母语人士仍能以近 100% 的可懂度理解句子,这可能是由于句子上下文中的单词导致的语义和语用限制(Patel 和 Xu,准备中) .
22 Indeed, when all F0 variation is removed from sentences of Mandarin, native speakers still understand the sentences with nearly 100% intelligibility, likely due to semantic and pragmatic constraints that result from having the words in sentence context (Patel & Xu, in preparation).
23确切地说,/p/ 的爆发可能涉及三个声音来源,部分时间重叠,都在大约 30 毫秒的过程中:嘴唇的短暂爆发,之间狭窄区域的短暂湍流(摩擦)嘴唇刚松开,声带恢复振动前会发出短暂的吸气声 (Stevens, 1997:492-494)。
23 To be precise, the burst of a /p/ can involve three sources of sound with partial temporal overlap, all in the course of approximately 30 milliseconds: a transient burst at the lips, a brief turbulence (frication) in the narrow region between the lips just after the closure is released, and a brief aspiration noise at the vocal folds before they resume vibrating (Stevens, 1997:492-494).
24 Kevin Munhall 提供了一个 X 射线语音数据库(带有一些在线剪辑):http: //psyc.queensu.ca/~munhallk/05_database.htm。
24 An X-ray database of speech (with some online clips) is available courtesy of Kevin Munhall: http://psyc.queensu.ca/~munhallk/05_database.htm.
25对于那些不熟悉 IPA 的人,表 2.3 中使用的一些符号是:η (ng) 是“sing”的末尾辅音,θ 是“thin”的首辅音,∂ 是“the”的首辅音,∫是“she”中的第一个辅音,3 是“unusual”中的第二个辅音。有关带有音频示例的完整 IPA 声音图表,请访问脚注 14 中列出的网站。
25 For those unfamiliar with IPA, some symbols used in table 2.3 are: η (ng) is the final consonant in “sing,” θ is the initial consonant in “thin,” ∂ is the initial consonant in “the,” ∫ is the initial consonant in “she,” and 3 is the second consonant in “unusual.” For the full IPA chart of sounds with audio examples, visit the website listed in footnote 14.
26语言可以通过根据长度或语音质量在元音之间进行音位对比来增加元音库存的一种方式。因此在日语和爱沙尼亚语中,元音持续时间是对比的,所以如果使用相同元音的短版本和长版本,一个词可以改变它的意思。在 Jalapa Mazatec 中,如果以正常(“模态”)语音、呼吸语音或吱吱作响的语音说出元音,则一个词可能表示三种完全不同的含义(Ladefoged & Maddieson,1996)。
26 One way that languages can increase their inventory of vowels by making phonemic contrasts between vowels based on length or voice quality. Thus in Japanese and Estonian, vowel duration is contrastive, so that a word can change its meaning if a short versus long version of the same vowel is used. In Jalapa Mazatec, a word can mean three entirely different things if the vowel is spoken in a normal (“modal”) voice, in a breathy voice, or a creaky voice (Ladefoged & Maddieson, 1996).
27 F3 在某些元音区分中很重要,例如,在“bird”中的r色元音。
27 F3 is important in some vowel distinctions, for example, in the r-colored vowel in “bird.”
28除了一个小例外:矩阵中的一种刺激被日本听众标记为 /wa/。
28 With one minor exception: One stimulus in the matrix was labeled as /wa/ by Japanese listeners.
29感谢 Matt Davis 提供这些示例,这些示例基于三共振峰正弦波语音,其中每个共振峰的幅度分布(在 50 Hz 下低通滤波)用于对正弦波进行幅度调制(戴维斯和约翰斯鲁德,2007 年)。
29 I am grateful to Matt Davis for providing these examples, which are based on three-formant sine-wave speech, in which the amplitude profile of each formant (low-pass filtered at 50 Hz) is used to amplitude-modulate the sine waves (Davis & Johnsrude, 2007).
30那些对语言与非语言感知模式感兴趣的人可能会发现考虑某些西非鼓乐团中的现象很有趣。在这样的合奏中,许多鼓的声音交织在一起,创造出一种具有独特声音的多节奏结构 (Locke, 1990)。这些模式是非语言的,但有时主鼓手可能会通过模仿当地语言的节奏和音调来使用他的鼓来发送语言信息(即,会说话的鼓信息;参见第 2.3.2 节,“映射”小节语言音调对比乐器”)。了解鼓语的听众会将其视为一种语言信息。这提出了一些有趣的问题:鼓声作为语言的感知如何与他们作为音乐声音的感知相互作用?在聆听非语言鼓声并听到(并理解)嵌入的鼓声语言信息的人的大脑中会观察到什么活动模式,而在另一个听到相同的说话鼓声但不理解它的听众的大脑中会观察到什么活动模式语?
30 Those interested in the idea of linguistic versus nonlinguistic modes of perception may find it interesting to consider a phenomenon in certain West African drum ensembles. In such ensembles, the sounds of many drums intertwine, creating a polyrhythmic texture with a unique sound (Locke, 1990). These patterns are nonlinguistic, yet from time to time the lead drummer may use his drum to send a linguistic message by imitating the rhythms and tones of the local language (i.e., a talking drum message; cf. section 2.3.2, subsection “Mapping Linguistic Tone Contrasts Onto Musical Instruments”). Listeners who know the drum language will perceive this as a linguistic message. This raises a number of interesting questions: How does the perception of drum sounds as language interact with their perception as musical sounds? What patterns of activity would be observed in the brain of a person who is listening to nonlinguistic drumming and hears (and understands) an embedded drummed linguistic message, versus in the brain of another listener who hears the same talking drum but does not understand it as language?
31在进行此类测试时,重要的是使用文化上熟悉的音阶,因为人们关注的是音乐中习得的声音类别。由于这个原因,使用任意的、数学定义的量表的研究很难解释(例如,Foxton 等,2003)。
31 In performing such tests, it is important to use culturally familiar scales, because one is focusing on learned sound categories in music. Studies using arbitrary, mathematically defined scales are difficult to interpret for this reason (e.g., Foxton et al., 2003).
32有趣的是,这个想法的灵感来自语音产生研究的理论,即考虑到听众的需要,说话者在进行声学对比时消耗的能量不会超过必要的能量(Lindblom,1990)。
32 Interestingly, this idea was inspired by a theory from speech production research, namely that speakers expend no more energy than necessary in making acoustic contrasts, given the needs of their listener (Lindblom, 1990).
33由于音乐五度中高音与低音的比例为 3:2,缺失的基音频率为低音频率的一半,比例为 3:2:1。
33 Because the ratio of the upper to lower note in a musical fifth is 3:2, the missing fundamental is half the frequency of the lower note, giving a ratio of 3:2:1.
34此类家庭的一个可能来源是出于文化或宗教原因而反对音乐的文化群体(例如,某些虔诚的穆斯林)。但是,必须注意环境中没有其他音程来源:例如,来自圣典的吟唱。寻找此类家庭的另一种方法是为那些在家不听音乐且其婴儿几乎没有接触过歌曲、音乐玩具和带音乐的电视或广播节目的个人做广告。人们希望这样的家庭不多,但考虑到人类的多样性,有些家庭可能存在。
34 One possible source of such households would be cultural groups in which music is frowned upon for cultural or religious reasons (e.g., certain pious Muslims). Care must be taken, however, that there are no other sources of musical intervals in the environment: for example, from the chanting of holy texts. Another way to seek such households would be to advertise for individuals who do not listen to music at home and whose infants have had minimal exposure to songs, musical toys, and television or radio shows with music. One hopes there are not many such households, but given human diversity, some are likely to exist.
35在进行此类研究时,重要的是要了解灵长类动物的发声是否和谐。如果是这样,那么五度音的任何偏差都可能反映出暴露于他们自己的通信系统频谱中的谐波间隔,而不是先天的听觉偏差(la Terhardt,1984 年和 Schwartz 等人,2003 年)。此外,如果非人类灵长类动物不对第五种表现出偏见,不清楚能得出什么结论。鉴于几乎所有非人类灵长类动物的听觉研究都是在猴子身上进行的(而不是在基因上更接近人类的猿类),猴子和人类听觉系统的差异可能是导致负面结果的原因。因此,这项研究唯一容易解释的结果是,如果灵长类动物确实显示出对第五种动物的处理优势。
35 In conducting such research it will be important to know if the vocalizations of the primates are harmonic. If so, then any bias for the fifth could reflect exposure to harmonic intervals in the spectra of their own communication systems, rather than innate auditory biases (a la Terhardt, 1984, and Schwartz et al., 2003). Furthermore, if non-human primates do not show a bias for the fifth, it is not clear what conclusions can be drawn. Given that virtually all non-human primate auditory research is done with monkeys (rather than with apes, which are genetically much closer to humans), it is possible that differences in the auditory systems of monkeys and humans would be responsible for the negative finding. Thus the only easily interpretable outcome of this research would be if the primates did show a processing advantage for the fifth.
Chapter 3Rhythm
3.1 Introduction
3.2 Rhythm in Music
3.2.1 The Beat: A Stable Mental Periodicity
3.2.2 Meter: Multiple Periodicities
3.2.3 Grouping: The Perceptual Segmentation of Events
3.2.4 Durational Patterning in Music
Duration Categories in Music
Expressive Timing in Music
3.2.5 The Psychological Dimensions of Musical Rhythm
3.3 Rhythm in Speech
3.3.1 Rhythmic Typology
Periodicity and Typology
Phonology and Typology
Duration and Typology
Perception and Typology
3.3.2 Principles Governing the Rhythmic Shape of Words and Utterances
Differences Between Linguistic and Musical Metrical Grids
Questioning the Principle of Rhythmic Alternation in Speech
3.3.3 The Perception of Speech Rhythm
The Perception of Isochrony in Speech
The Role of Rhythmic Predictability in Speech Perception
The Role of Rhythm in Segmenting Connected Speech
The Role of Rhythm in the Perception of Nonnative Accents
3.3.4 Final Comments on Speech Rhythm: Moving Beyond Isochrony
3.4 Interlude: Rhythm in Poetry and Song
3.4.1 Rhythm in Poetry
3.4.2 Rhythm in Song
3.5 Nonperiodic Aspects of Rhythm as a Key Link
3.5.1 Relations Between Musical Structure and Linguistic Rhythm
3.5.2 Relations Between Nonlinguistic Rhythm Perception and Speech Rhythm
3.5.3 Neural Relationships Between Rhythm in Speech and Music
3.6 Conclusion
Appendixes
A.1 The nPVI Equation
A.2 Musical nPVI Values of Different Nations
口头和音乐节奏的比较研究出奇地不发达。尽管数百项研究探索了每个领域内的节奏,但语言和音乐节奏的实证比较却很少见。这并不反映缺乏兴趣,因为研究人员早就注意到这两个领域中节奏理论之间的联系(例如,Selkirk,1984 年;Handel,1989 年)。比较研究的匮乏可能反映了一个事实,即一个领域的专家很少有时间深入研究另一个领域的复杂性。这是令人遗憾的,因为跨领域工作可以为人类认知节奏提供更广阔的视角。本章的一个目标是为研究人员提供概念和实证工具,以探索语言和音乐节奏之间的边界。正如我们将看到,
The comparative study of spoken and musical rhythm is surprisingly underdeveloped. Although hundreds of studies have explored rhythm within each domain, empirical comparisons of linguistic and musical rhythm are rare. This does not reflect a lack of interest, because researchers have long noted connections between theories of rhythm in the two domains (e.g., Selkirk, 1984; Handel, 1989). The paucity of comparative research probably reflects the fact that specialists in one domain seldom have the time to delve into the intricacies of the other. This is regrettable, because cross-domain work can provide a broader perspective on rhythm in human cognition. One goal of this chapter is to equip researchers with conceptual and empirical tools to explore the borderland between linguistic and musical rhythm. As we shall see, this is a fertile area for new discoveries.
在开始之前,有必要解决两个首要问题。首先是节奏的定义。除了语音和音乐,“节奏”一词还出现在许多上下文中,例如昼夜节律、大脑振荡和某些动物有节奏的叫声。在大多数这些上下文中,“节奏”表示周期性,换句话说,一种在时间上有规律地重复的模式。尽管周期性是节奏的一个重要方面,但区分这两个概念至关重要。问题的症结很简单:虽然所有的周期性模式都是有节奏的,但并非所有的节奏模式都是周期性的。也就是说,周期性只是一种节奏组织。这一点对于理解语音节奏尤为重要,它与周期性概念有着长期的(而且我们将看到,这在很大程度上是徒劳的)联系。因此,任何节奏的定义都应该保留周期性问题。不幸的是,没有普遍接受的节奏定义。因此,我将节奏定义为声音在时间、重音和分组方面的系统模式。语音和音乐都以系统的时间、重音和短语模式为特征。这些模式如何比较?他们心目中的关系是什么?
Before embarking, it is worth addressing two overarching issues. The first is the definition of rhythm. The term “rhythm” occurs in many contexts besides speech and music, such as circadian rhythms, oscillations in the brain, and the rhythmic calls of certain animals. In most of these contexts, “rhythm” denotes periodicity, in other words, a pattern repeating regularly in time. Although periodicity is an important aspect of rhythm, it is crucial to distinguish between the two concepts. The crux of the matter is simply this: Although all periodic patterns are rhythmic, not all rhythmic patterns are periodic. That is, periodicity is but one type of rhythmic organization. This point is especially important for understanding speech rhythm, which has had a long (and as we shall see, largely unfruitful) association with the notion of periodicity. Thus any definition of rhythm should leave open the issue of periodicity. Unfortunately, there is no universally accepted definition of rhythm. Thus I will define rhythm as the systematic patterning of sound in terms of timing, accent, and grouping. Both speech and music are characterized by systematic temporal, accentual, and phrasal patterning. How do these patterns compare? What is their relationship in the mind?
第二个问题是演讲中节奏的概念,有些读者可能不熟悉。非正式地介绍这个概念的一种方法是考虑学习一门外语的过程。说一门母语流利的语言需要的不仅仅是掌握它的音素、词汇和语法。人们还必须掌握表征句子中音节流动的时间和重音模式。也就是说,每种语言都有一种韵律,它是其声音结构的一部分,而对这种韵律的隐含知识是说话者的语言能力的一部分。未能获得本土节奏是在讲话中产生外国口音的一个重要因素(Taylor,1981;Faber,1986;Chela-Flores,1994)。
The second issue is the very notion of rhythm in speech, which may be unfamiliar to some readers. One way to informally introduce this concept is to consider the process of learning a foreign language. Speaking a language with native fluency requires more than mastering its phonemes, vocabulary, and grammar. One must also master the patterns of timing and accentuation that characterize the flow of syllables in sentences. That is, each language has a rhythm that is part of its sonic structure, and an implicit knowledge of this rhythm is part of a speaker’s competence in their language. A failure to acquire native rhythm is an important factor in creating a foreign accent in speech (Taylor, 1981; Faber, 1986; Chela-Flores, 1994).
以下两节(3.2和3.3)分别概述了音乐和语音中的节奏,重点关注与跨域比较相关的问题。(此类比较在每个部分的适当位置进行。)这些概述激发了一种特殊的方式来看待语言和音乐之间的节奏关系。本章的最后一节介绍了这一新观点,以及跨越声学、感知和神经研究的经验证据。
The following two sections (3.2 and 3.3) give overviews of rhythm in music and speech, respectively, focusing on issues pertinent to cross-domain comparisons. (Such comparisons are made within each section where appropriate.) These overviews motivate a particular way of looking at rhythmic relations between speech and music. This new perspective is introduced in the final section of the chapter, together with empirical evidence spanning acoustic, perceptual, and neural studies.
以下关于音乐节奏的讨论主要集中在具有规律节拍的音乐上,节拍是一种感知上同步的脉冲,人们可以将其与周期性运动(例如敲击声或脚步声)同步。此外,重点是西欧传统音乐,其中节拍按节拍强度等级组织,强节拍和弱节拍交替出现。从理论和经验的角度来看,这种节奏组织形式得到了最广泛的研究,也是最常与言语进行比较的节奏类型,无论是隐含的还是明确的 (Pike, 1945; Liberman, 1975; Selkirk, 1984)。
The following discussion of rhythm in music focuses on music that has a regularly timed beat, a perceptually isochronous pulse to which one can synchronize with periodic movements such as taps or footfalls. Furthermore, the focus is on music of the Western European tradition, in which beats are organized in hierarchies of beat strength, with alternation between stronger and weaker beats. This form of rhythmic organization has been the most widely studied from a theoretical and empirical standpoint, and is also the type of rhythm most often compared with speech, either implicitly or explicitly (Pike, 1945; Liberman, 1975; Selkirk, 1984).
然而,重要的是要认识到,这只是人类组织音乐节奏的一种方式。如果西方音乐的节奏结构表明了节奏模式的一般原则,那将很方便。然而,现实要复杂得多,只有比较不同的文化传统才能帮助区分普遍性和特殊性。为了说明这一点,人们可以注意到音乐传统中节奏的组织方式与大多数西欧音乐截然不同。
It is important to realize, however, that this is just one way in which humans organize musical rhythm. It would be convenient if the rhythmic structure of Western music indicated general principles of rhythmic patterning. Reality is more complex, however, and only a comparison of different cultural traditions can help sift what is universal from what is particular. To illustrate this point, one can note musical traditions in which rhythm is organized in rather different ways than in most Western European music.
其中一个传统涉及琴,这是一种七弦无品古筝,在中国已经演奏了 2000 多年(van Gulik,1940 年)。该乐器的乐谱不包含单个音符的时间标记,仅指示用于产生音符的字符串和手势类型(尽管有时会标记乐句边界)。由此产生的音乐没有节拍感。相反,它具有流畅的品质,其中音符的时间来自双手的手势动态,而不是来自明确规定的时间计划。Ch'in 只是来自世界各地的许多无脉冲音乐的例子之一,所有这些都表明头脑能够在不参考节拍的情况下组织时间模式。
One such tradition involves the Ch’in, a seven string fretless zither that has been played in China for over 2,000 years (van Gulik, 1940). The musical notation for this instrument contains no time markings for individual notes, indicating only the string and type of gesture used to produce the note (though sometimes phrase boundaries are marked). The resulting music has no sense of a beat. Instead, it has a flowing quality in which the timing of notes emerges from the gestural dynamics of the hands rather than from an explicitly regulated temporal scheme. The Ch’in is just one of many examples of unpulsed music from around the globe, all of which show that the mind is capable of organizing temporal patterns without reference to a beat.
另一种节奏与西欧音乐截然不同的传统是来自东欧的巴尔干民间音乐 (Singer, 1974; London, 1995)。这种音乐有明显的节拍,但节拍不是按固定的时间间隔隔开的。相反,节拍之间的间隔或长或短,长间隔是较短间隔长度的 3/2。节奏循环是由长短间隔的重复模式构建的,例如 SSSL、SSLSS(请注意,长元素不限于在循环结束时出现)。人们可能会认为这种不对称的结构会使音乐难以跟上或与之同步。事实上,听着这种音乐长大的听众很擅长跟随这些复杂的韵律(Hannon & Trehub,2005),而这种音乐中的大部分实际上是舞曲,
Another tradition whose rhythms are quite different from Western European music is Balkan folk music from Eastern Europe (Singer, 1974; London, 1995). This music has salient beats, but the beats are not spaced at regular temporal intervals. Instead, intervals between beats are either long or short, with the long interval being 3/2 the length of the shorter one. Rhythmic cycles are built from repeating patterns of long and short intervals, such as S-S-S-L, S-S-L-S-S (note that the long element is not constrained to occur at the end of the cycle). One might think that such an asymmetric structure would make the music difficult to follow or synchronize with. In fact, listeners who grew up with this music are adept at following these complex meters (Hannon & Trehub, 2005), and much of this music is actually dance music, in which footfalls are synchronized to the asymmetric beats.
作为与西欧音乐不同方向的节奏传统的最后一个例子,西非的加纳鼓乐展示了许多有趣的特征。首先,基本的节奏参考是在一组手铃上播放的重复的、非等时的时间模式(Locke 1982;Pantaleoni,1985)。鼓合奏的成员通过聆听与钟声相关的部分来保持他们的节奏方向,而不是通过专注于等时节拍。此外,节奏周期的第一拍不会被视为“强拍”,换言之,特别强烈的拍子(如在西方音乐中);如果有的话,最突出的节拍出现在最后循环(Temperley,2000)。最后,这种音乐强调其聆听方式的多样性。当不同的鼓进入时,每个鼓都有自己特有的重复时间模式,就会创建一种多节奏纹理,根据人们选择关注的节奏层和关系,提供丰富的替代感知可能性来源(Locke 1982;Pressing,2002)。这与大多数西欧音乐的节奏框架截然不同,后者强调相对简单和感知一致的节奏结构。造成这种差异的一个可能原因是西方音乐在其他音乐维度(例如和声)上有主要的关注点,而相对简单的节奏框架有助于在这些其他领域进行复杂的探索。另一个原因可能是西欧音乐的节奏通常是灵活的,节拍的明显减速和加速用于表达目的。一个相当简单的节拍结构可以帮助听众在面对这些时间波动时保持定向(参见 Temperley,2004)。
As a final example of a rhythmic tradition with a different orientation from Western European music, Ghanian drumming in West Africa shows a number of interesting features. First, the basic rhythmic reference is a repeating, non-isochronous time pattern played on a set of hand bells (Locke 1982; Pantaleoni, 1985). Members of a drum ensemble keep their rhythmic orientation by hearing their parts in relation to the bell, rather than by focusing on an isochronous beat. Furthermore, the first beat of a rhythmic cycle is not heard as a “downbeat,” in other words, a specially strong beat (as in Western music); if anything, the most salient beat comes at the end of the cycle (Temperley, 2000). Finally, this music emphasizes diversity in terms of the way it can be heard. As different drums enter, each with its own characteristic repeating temporal pattern, a polyrhythmic texture is created that provides a rich source of alternative perceptual possibilities depending on the rhythmic layers and relationships one chooses to attend to (Locke 1982; Pressing, 2002). This is quite different from the rhythmic framework of most Western European music, in which the emphasis is on relatively simple and perceptually consensual rhythmic structures. One possible reason for this difference is that Western music has major preoccupations in other musical dimensions (such as harmony), and a relatively simple rhythmic framework facilitates complex explorations in these other areas. Another reason may be that tempo in Western European music is often flexible, with salient decelerations and accelerations of the beat used for expressive purposes. A fairly simple beat structure may help a listener stay oriented in the face of these temporal fluctuations (cf. Temperley, 2004).
因此,如果认为西欧音乐的节奏结构反映了思维在产生或感知方面构建节奏模式的基本限制,那将是错误的。与每一种音乐传统一样,西欧音乐的节奏模式反映了特定文化的历史和音乐问题。另一方面,比较视角揭示了西欧音乐节奏的某些方面(例如有规律的节拍和将事件分组为乐句)也存在于许多其他文化中,这表明这些方面反映了人类思维的广泛认知倾向。
Thus it would be an error to assume that the rhythmic structure of Western European music reflects basic constraints on how the mind structures rhythmic patterns in terms of production or perception. As with every musical tradition, the rhythmic patterns of Western European music reflect the historical and musical concerns of a given culture. On the other hand, a comparative perspective reveals that certain aspects of rhythm in Western European music (such as a regular beat and grouping of events into phrases) are also found in numerous other cultures, which suggests that these aspects reflect widespread cognitive proclivities of the human mind.
下面的讨论有时依赖于一个特定的旋律来说明西方音乐中节奏结构的各个方面。这是一首儿歌的旋律,在波希米亚民谣旋律数据库中索引为旋律 K0016(Schaffrath,1995;Selfridge-Feld,1995)。图 3.1显示了西方音乐符号和“钢琴卷”符号中的旋律,每个音调的音高作为时间的函数绘制(旋律可以在声音示例 3.1 中听到)。
The discussion below relies at times on one particular melody to illustrate various aspects of rhythmic structure in Western music. This is the melody of a children’s song, indexed as melody K0016 in a database of Bohemian folk melodies (Schaffrath, 1995; Selfridge-Feld, 1995). Figure 3.1 shows the melody in Western music notation and in “piano roll” notation with each tone’s pitch plotted as a function of time (the melody can be heard in Sound Example 3.1).
之所以选择旋律,是因为它在历史上是最近的,并且遵循熟悉的西方惯例,但大多数读者不太可能熟悉,因此没有特定的记忆联想。它还以简单的形式说明了节奏的基本方面。除此之外,这个旋律没有什么特别之处,任何数量的其他旋律都可以达到同样的目的。
The melody was chosen because it is historically recent and follows familiar Western conventions, yet is unlikely to be familiar to most readers and is thus free of specific memory associations. It also illustrates basic aspects of rhythm in a simple form. Beyond this, there is nothing special about this melody, and any number of other melodies would have served the same purpose.
音乐节拍的现象似乎很简单,因为它是如此熟悉。几乎每个人都随着节拍敲击或跳舞。有规律的节拍在音乐文化中很普遍,值得考虑为什么会这样。节拍的一个明显功能是协调同步运动,例如舞蹈。(舞蹈和音乐之间的关系在人类社会中很普遍;事实上,有些文化甚至没有单独的音乐和舞蹈术语。)节拍的第二个明显功能是为合奏表演提供共同的时间参考。事实上,跨文化的视角揭示了没有周期性时间框架的合奏音乐是一个罕见的例外。Perlman (1997) 指出爪哇音乐中的一种例外情况,称为pathetan,注意到“除了某些孤立的短语,pathetan没有统一的度量框架。. . . 不需要节奏一致,音乐家不需要将他们的攻击与特定节拍可能实现的精确度相匹配”(第 105 页)。Brinner (1995:245-267) 对 pathetan 的详细讨论表明,这是一个证明规则的例外:没有公制框架,演奏者将密切关注替代主旋律乐器(通常是rebab或弓形琵琶),以便协调和指导他们的表现。因此,当合奏音乐中的周期性被取消时,它的功能性角色就会以其他方式得到填补。
The phenomenon of a musical beat seems simple because it is so familiar. Almost everyone has tapped or danced along to music with a beat. A regular beat is widespread in musical cultures, and it is worth considering why this might be so. One obvious function of a beat is to coordinate synchronized movement, such as dance. (The relationship between dance and music is widespread in human societies; indeed, some cultures do not even have separate terms for music and dance.) A second obvious function of a beat is to provide a common temporal reference for ensemble performance. Indeed, a cross-cultural perspective reveals that ensemble music without a periodic temporal framework is a rare exception. Perlman (1997) points to one such exception in Javanese music known as pathetan, noting that “Except for certain isolated phrases, pathetan has no unifying metric framework. . . . Rhythmic unison is not desired, and the musicians need not match their attacks with the precision made possible by a definite meter” (p. 105). A detailed discussion of pathetan by Brinner (1995:245-267) suggests that it is an exception that proves the rule: without a metric frame, players substitute close attention to a lead melodic instrument (typically a rebab or bowed lute) in order to coordinate and orient their performance. Thus when periodicity in ensemble music is withdrawn, its functional role is filled in other ways.
图 3.1 (A) 乐谱和 (B) 钢琴卷轴格式的简单旋律 (K0016)。在 (B) 中,y 轴显示每个音高与 C4 (261.63 Hz) 的半音距离。
Figure 3.1 A simple melody (K0016) in (A) music notation and (B) piano roll format. In (B), the y-axis shows the semitone distance of each pitch from C4 (261.63 Hz).
从听众的角度来看,对节拍的感知通常与节拍同步形式的运动相关联。对于许多人来说,这种同步是音乐体验的自然组成部分,无需特别努力。那么,令人惊讶的是,人类是唯一能够自发地与音乐节拍同步的物种。尽管动物王国的其他部分也知道同步,例如青蛙的合唱或昆虫的同步叫声(Gerhardt & Huber 2002,第 8 章;Strogatz,2003),但人类与节拍的同步在许多方面(见第 7 章,第 7.5.3 节,以进一步讨论这一点)。当然,节拍感知不会自动引起运动(人总是可以坐着不动),但节拍同步的人类独特性表明节拍感知值得进行心理学研究。音乐认知研究揭示了关于节拍感知的几个有趣事实。
From a listener’s perspective, perception of a beat is often linked to movement in the form of synchronization to the beat. For many people, this synchronization is a natural part of musical experience requiring no special effort. It may come as a surprise, then, that humans are the only species to spontaneously synchronize to the beat of music. Although synchrony is known from other parts of the animal kingdom, such as the chorusing of frogs or the synchronized calls of insects (Gerhardt & Huber 2002, Ch. 8; Strogatz, 2003), human synchronization with a beat is singular in a number of respects (see Chapter 7, section 7.5.3, for further discussion of this point). Of course, beat perception does not automatically cause movement (one can always sit still), but the human uniqueness of beat synchronization suggests that beat perception merits psychological investigation. Research in music cognition has revealed several interesting facts about beat perception.
首先,节拍感知有一个首选的速度范围。人们很难跟随快于每 200 毫秒和慢于每 1.2 秒的节拍。在此范围内,偏好大约每 500-700 毫秒发生一次的节拍(Parncutt,1994;van Noorden & Moelants,1999)。有趣的是,这与人们在做出持续时间判断时最准确的范围相同,换句话说,他们既不会高估也不会低估时间间隔的持续时间 (Eisler, 1976; cf. Fraisse, 1982)。此外,这是听众在判断速度上的微小差异时最准确的范围 (Drake & Botte, 1993)。同样有趣的是,在具有重读和非重读音节的语言中,
First, there is a preferred tempo range for beat perception. People have difficulty following a beat that is faster than every 200 ms and slower than every 1.2 seconds. Within this range, there is a preference for beats that occur roughly every 500-700 ms (Parncutt, 1994; van Noorden & Moelants, 1999). It is interesting to note that this is the same range in which people are the most accurate at making duration judgments, in other words, they neither overestimate nor underestimate the duration of temporal intervals (Eisler, 1976; cf. Fraisse, 1982). Furthermore, this is the range in which listeners are the most accurate in judging slight differences in tempo (Drake & Botte, 1993). It is also interesting to note that in languages with stressed and unstressed syllables, the average duration between stressed syllables has been reported to be close to or within this range (Dauer, 1983; Lea, 1974, described in Lehiste, 1977).
其次,虽然人们通常倾向于一种特定的节拍速度,但他们可以以其他速度敲击,这些速度是他们喜欢的敲击速度的简单除数或倍数(例如,以他们喜欢的速度的两倍或一半;Drake、Jones 和 Baruch,2000)。例如,考虑 Sound Example 3.2,它展示了 K0016有两种不同的节拍指示。两者都是完全可能的,并且很可能大多数人都可以轻松地敲击任一级别,这取决于他们关注的是节奏结构的较低级别还是较高级别。Drake、Jones 和 Baruch (2000) 表明,人们在音乐中同步的水平不同,并且他们偏好的水平与他们的自发敲击率相关。此外,虽然个人自然地倾向于一个特定的水平,但如果他们愿意,他们可以移动到更高或更低的水平(例如,通过加倍或减半他们的敲击率)并且仍然感觉与音乐同步。因此,当谈到一首曲子的“节拍”时,重要的是要记住,听众选择的节拍只是节拍层次结构中的一个级别(他们的节奏)。
Second, although people usually gravitate toward one particular beat tempo, they can tap at other tempi that are simple divisors or multiples of their preferred tapping rate (e.g., at double or half their preferred rate; Drake, Jones, & Baruch, 2000). For example, consider Sound Example 3.2, which presents K0016 along with two different indications of the beat. Both are perfectly possible, and it is likely that most people could easily tap at either level depending on whether they focus on lower or higher level aspects of rhythmic structure. Drake, Jones, and Baruch (2000) have shown that people vary in the level they synchronize with in music, and that their preferred level correlates with their spontaneous tapping rate. Furthermore, although individuals naturally gravitate to one particular level, they can move to higher or lower levels if they wish (e.g., by doubling or halving their tapping rate) and still feel synchronized with the music. Thus when speaking of “the beat” of a piece, it is important to keep in mind that what a listener selects as the beat is just one level (their tactus) in a hierarchy of beats.
第三,节拍感知对于适度的节奏波动是稳健的。在许多形式的音乐中,作为表达表演的一部分,事件的整体时间在短语或段落中减慢或加快(Palmer,1997)。人们仍然能够感知此类音乐中的节拍 (Large & Palmer, 2002) 并与之同步 (Drake, Penel, & Bigand, 2000),表明节拍感知是基于灵活的计时机制。
Third, beat perception is robust to moderate tempo fluctuations. In many forms of music, the overall timing of events slows down or speeds up within phrases or passages as part of expressive performance (Palmer, 1997). People are still able to perceive a beat in such music (Large & Palmer, 2002) and synchronize to it (Drake, Penel, & Bigand, 2000), indicating that beat perception is based on flexible timekeeping mechanisms.
第四,节拍感知存在文化差异。Drake 和 Ben El Heni (2003) 研究了法国和突尼斯听众如何利用法国和突尼斯音乐的节拍。法国人对法国音乐的敲击速度比对突尼斯音乐的敲击速度慢,而突尼斯人则表现出相反的模式。Drake 和 Ben Heni 认为,这反映了这样一个事实,即听众可以从他们熟悉的音乐中提取更大规模的结构特性。这些发现表明节拍感知不仅仅是听觉系统对声音物理周期性的被动反应:它还涉及可能与音乐结构知识相关的文化影响(例如,对音符如何分组为动机的敏感性;参见 Toiviainen & 埃罗拉,2003 年)。
Fourth, there is cultural variability in beat perception. Drake and Ben El Heni (2003) studied how French versus Tunisian listeners tapped to the beat of French versus Tunisian music. The French tapped at a slower rate to French music than to Tunisian music, whereas the Tunisians showed the opposite pattern. Drake and Ben Heni argue that this reflects the fact that listeners can extract larger-scale structural properties in music with which they are familiar. These findings indicate that beat perception is not simply a passive response of the auditory system to physical periodicity in sound: It also involves cultural influences that may relate to knowledge of musical structure (e.g., sensitivity to how notes are grouped into motives; cf. Toiviainen & Eerola, 2003).
第五,从认知科学的角度来看,从认知科学的角度来看,感知到的节拍可以容忍大量反证据,形式为非节拍位置的重音事件和节拍位置的缺失或弱事件,换句话说,切分音(Snyder & Krumhansl, 2001)。例如,考虑声音示例 3.3 和 3.4,这是 Patel、Iversen 等人研究的两个复杂时间模式。(2005) 关于节拍感知和同步。模式以 9 个音调的等时序列开始,用于指示节拍,其周期为 800 毫秒。在这个“感应序列”之后,模式变为更复杂的节奏,但具有相同的节拍周期。参与者被要求将他们的敲击与同步音调同步,然后在复杂的序列中以相同的速度继续敲击。他们在这项任务中的成功被视为衡量他们从这些序列中提取节拍的能力。在“强韵律”(SM) 序列(声音示例 3.3)中,每个节拍位置都会出现一个音调。然而,在“弱韵律”(WM) 序列中,大约 1/3 的节拍位置是无声的(声音示例 3.4)。(注意:SM 和 WM 序列具有完全相同的一组间隔时间,只是时间安排不同;比照。Povel & Essens, 1985.) 因此,WM 序列中成功的节拍感知和同步需要在无声音的点上频繁敲击。
Fifth, and of substantial interest from a cognitive science standpoint, a perceived beat can tolerate a good deal of counterevidence in the form of accented events at nonbeat locations and absent or weak events at beat locations, in other words, syncopation (Snyder & Krumhansl, 2001). For example, consider Sound Examples 3.3 and 3.4, two complex temporal patterns studied by Patel, Iversen, et al. (2005) with regard to beat perception and synchronization. The patterns begin with an isochronous sequence of 9 tones that serves to indicate the beat, which has a period is 800 ms. After this “induction sequence,” the patterns change into a more complex rhythm but with the same beat period. Participants were asked to synchronize their taps to the isochronous tones and then continue tapping at the same tempo during the complex sequence. Their success at this task was taken as a measure of how well they were able to extract a beat from these sequences. In the “strongly metrical” (SM) sequences (Sound Example 3.3), a tone occurred at every beat position. In the “weakly metrical” (WM) sequences, however, about 1/3 of the beat positions were silent (Sound Example 3.4). (NB: The SM and WM sequences had exactly the same set of interonset intervals, just arranged differently in time; cf. Povel & Essens, 1985.) Thus successful beat perception and synchronization in WM sequences required frequent taps at points with no sound.
所有参与者都能够与 SM 序列的节拍同步:他们的敲击时间非常接近理想化的节拍位置。(事实上,拍打通常先于节拍一小段,这是节拍同步研究的典型发现,表明节拍感知是预期的而不是反应性的。)更有趣的是 WM 序列的表现。虽然同步不如通过敲击可变性测量的 SM 序列准确,但大多数参与者(即使是未受过音乐训练的人)能够敲击这些序列的节拍,尽管从物理角度来看,节拍周期几乎没有周期性.也就是说,大多数人都在轻拍无声节拍,就好像他们身临其境一样,这说明节拍感知可以容忍大量的反证。
All participants were able to synchronize with the beat of the SM sequence: Their taps were very close in time to the idealized beat locations. (In fact, taps typically preceded the beat by a small amount, a finding typical of beat synchronization studies, indicating that beat perception is anticipatory rather than reactive.) Of greater interest was performance on the WM sequences. Although synchronization was not as accurate as with the SM sequences as measured by tapping variability, most participants (even the musically untrained ones) were able to tap to the beat of these sequences, though from a physical standpoint there was little periodicity at the beat period. That is, most people tapped to the silent beats as if they were physically there, illustrating that beat perception can tolerate a good deal of counterevidence.
上述事实表明节拍感知是一种复杂的现象,可能具有复杂的认知和神经基础。具体来说,它涉及一种时间心理模型,其中周期性时间预期起着关键作用(Jones,1976)。这可能是它是人类独有的原因之一。
The above facts indicate that beat perception is a complex phenomenon that likely has sophisticated cognitive and neural underpinnings. Specifically, it involves a mental model of time in which periodic temporal expectancies play a key role (Jones, 1976). This may be one reason why it is unique to humans.
节拍感知是音乐认知研究的一个活跃领域,长期以来人们一直对听众用来提取节拍的线索感兴趣。Temperley 和 Bartlette (2002) 列出了六个因素,大多数研究人员认为这六个因素对寻找节拍(即从一段音乐中推断节拍)很重要。这些可以表示为偏好:
Beat perception is an active area of research in music cognition, in which there has long been an interest in the cues listeners use to extract a beat. Temperley and Bartlette (2002) list six factors that most researchers agree are important in beat finding (i.e., in inferring the beat from a piece of music). These can be expressed as preferences:
1. 使节拍与音符开始一致
1. For beats to coincide with note onsets
2. 使节拍与较长的音符一致
2. For beats to coincide with longer notes
3.节拍的规律性
3. For regularity of beats
4. 使节拍与乐句的开头对齐
4. For beats to align with the beginning of musical phrases
5. 使节拍与谐波变化点对齐
5. For beats to align with points of harmonic change
6. 使节拍与重复旋律模式的开始对齐
6. For beats to align with the onsets of repeating melodic patterns
因为节拍感知是音乐的基础并且适合实证研究,它吸引了计算、行为和神经方法(例如,Desain,1992 年;Desain & Honing,1999 年;Todd 等人,1999 年;Large,2000 年;Toiviainen & Snyder, 2003; Hannon et al., 2004; Snyder & Large, 2005; Zanto et al., 2006) 并有可能发展成为一个复杂的音乐认知分支,其中不同的模型竞争解释一组共同的行为和神经数据。它的研究也很有吸引力,因为它涉及认知神经科学中更大的问题。例如,与节拍同步提供了一个机会来研究不同的大脑系统在感知和行为方面是如何协调的(在这种情况下,是听觉和运动系统)。(如帕金森病)启动和协调运动(Thaut 等人,1999 年;参见 Sacks,1984 年,2007 年)。
Because beat perception is fundamental to music and is amenable to empirical study, it has attracted computational, behavioral, and neural approaches (e.g., Desain, 1992; Desain & Honing, 1999; Todd et al., 1999; Large, 2000; Toiviainen & Snyder, 2003; Hannon et al., 2004; Snyder & Large, 2005; Zanto et al., 2006) and has the potential to mature into a sophisticated branch of music cognition in which different models compete to explain a common set of behavioral and neural data. Its study is also attractive because it touches on larger issues in cognitive neuroscience. For example, synchronization to a beat provides an opportunity to study how different brain systems are coordinated in perception and behavior (in this case, the auditory and motor systems). A better understanding of the mechanisms involved in beat perception and synchronization could have applications for physical therapy, in which synchronization with a beat is being used to help patients with neuromotor disorders (such as Parkinson’s disease) to initiate and coordinate movement (Thaut et al., 1999; cf. Sacks, 1984, 2007).
在西欧音乐中,节拍并非生而平等。相反,一些节拍比其他节拍更强,这有助于在节拍的分组和/或加重方面创造更高水平的周期性。例如,华尔兹的节拍被分成三组,重音在每组的第一拍上,而在进行曲中,节拍被分成两个或四个,主要重音在第一拍(在四拍进行曲中,第三拍有次要重音)。
In Western European music, beats are not all created equal. Instead, some beats are stronger than others, and this serves to create a higher level of periodicity in terms of the grouping and/or accentuation of beats. For example, the beats of a waltz are grouped in threes, with an accent on the first beat of each group, whereas in a march beats are grouped into twos or fours, with primary accent on the first beat (in a four-beat march, there is secondary accent on the third beat).
华尔兹和进行曲只是西欧音乐中使用的多种拍子中的两种拍子,但它们有助于说明这一传统中拍子的一些一般特征。首先,西方音乐的节拍以二和三的倍数组织为主,有多少节拍构成一个基本单位(小节),每个节拍有多少细分。例如,华尔兹每小节有三个节拍,每个节拍可以细分为两个较短的节拍,而进行曲每个小节有两个(或四个)节拍,每个节拍也可以细分为两个节拍。存在许多其他可能性,例如每小节两个节拍,每个节拍被细分为三个节拍。1个关键点是,节拍通常至少有一个低于节拍的细分级别(伦敦,2002 年,2004 年:34),此外还有强节拍的时间模式所产生的节拍上方的周期性。表示这一点的一种方法是通过格律网格,它使用等时点行表示周期性层。这些行中的一排代表了 tactus,上面的一行显示了 tactus 上方的周期性加重模式。节奏上方或下方的其他行显示了其他心理上可访问的周期性水平(图 3.2显示了 K0016 的格律网格)。
Waltzes and marches are but two types of meter in a broad diversity of meters used in Western European music, but they serve to illustrate some general features of meter in this tradition. First, the meters of Western music are dominated by organization in terms of multiples of two and three in terms of how many beats constitute a basic unit (the measure), and how many subdivisions of each beat there are. For example, a waltz has three beats per measure, each of which can be subdivided into two shorter beats, whereas a march has two (or four) beats per measure, each of which can also be subdivided into two beats. Many other possibilities exist, for example two beats per measure, each of which is subdivided into three beats.1 The key point is that meter typically has at least one level of subdivision below the beat (London, 2002, 2004:34), in addition to periodicity above the beat created by the temporal patterning of strong beats. One way to represent this is via a metrical grid, which indicates layers of periodicity using rows of isochronous dots. One of these rows represents the tactus, with the row above this showing the periodic pattern of accentuation above the tactus. Other rows above or below the tactus show other psychologically accessible levels of periodicity (Figure 3.2 shows a metrical grid for K0016).
因此,人们应该能够敲击这些级别中的任何一个,并且仍然感觉与音乐同步。(在格律网格中使用点表明米涉及时间点的感知组织,在物理术语中对应于音调的感知攻击;Lerdahl & Jackendoff,1983。)在网格符号中,每个节拍的相对强度由其上方的点数表示,换句话说,它参与的周期层数。最高和最低级别的点必须落在仪表的“时间包络”内:周期性快于 200 毫秒且慢于~4-6 s 不太可能被自发地视为度量框架的一部分。(请注意,此包络的上端明显长于 ~1.2 秒的限制,以遵循中提到的节拍第 3.2.1 节。较短的限制是指节拍间隔,而 4-6 秒是指最高的韵律水平,并且可能与我们对心理当下的感觉有关(参见伦敦,2002 年,2004 年:30)。2个
Thus one should be able to tap to any of these levels and still feel synchronized with the music. (The use of dots in metrical grids indicates that meter concerns the perceptual organization of points in time, which in physical terms would correspond to the perceptual attacks of tones; Lerdahl & Jackendoff, 1983.) In grid notation, the relative strength of each beat is indicated by the number of dots above it, in other words, the number of layers of periodicity it participates in. Dots at the highest and lowest level must fall within the “temporal envelope” for meter: Periodicities faster than 200 ms and slower than ~4-6 s are unlikely to be spontaneously perceived as part of a metric framework. (Note that the upper end of this envelope is substantially longer than the ~1.2 second limit for following a beat mentioned in section 3.2.1. That shorter limit refers to beat-to-beat intervals, whereas 4-6 s refers to the highest metrical levels, and is likely to be related to our sense of the psychological present (cf. London, 2002, 2004:30).2
图 3.2 K0016 的度量结构。典型的 tactus 由标记为 1x 的韵律级别显示。乐句边界显示在钢琴卷帘符号下方(p1 = 乐句 1,等等)。
Figure 3.2 Metrical structure of K0016. A typical tactus is shown by the metrical level labeled 1x. Phrase boundaries are indicated below the piano roll notation (p1 = phrase 1, etc.).
在继续之前,应该讨论重音和韵律之间的关系。这是一个重要的关系,因为强节拍是音乐中的感知重音点。这种口音并不总是靠物理线索,如强度或持续时间(注意,例如,K0016 中的所有音调都具有相同的强度),并从多个时间尺度的周期性检测中出现。由于各种因素,包括持续时间、强度和旋律轮廓的变化,音乐中当然有许多物理(或“现象”)重音。由于音乐中的突出结构点,例如,突然的和声变化或乐句的开始,也存在“结构重音”(Lerdahl & Jackendoff,1983)。不同重音类型的相互作用是音乐复杂性的来源之一 (Jones, 1993),尤其是格律重音与非节拍现象或结构重音的相互作用。例如,音乐中的切分音说明成功地使用了现象级重音“反对流行韵律”的节奏。听者头脑中多个周期性的心理模式,而不仅仅是一个序列的重音结构图。这一点将与语言中格律网格的讨论相关。
Before moving on, the relationship between accent and meter should be discussed. This is an important relationship, because strong beats are perceptually accented points in the music. This kind of accent does not always rely on physical cues such as intensity or duration (note, for example, that all tones in K0016 are of equal intensity), and emerges from the detection of periodicity at multiple time-scales. There are of course many physical (or “phenomenal”) accents in music due to a variety of factors, including duration, intensity, and changes in melodic contour. There are also “structural accents” due to salient structural points in the music, for example, a sudden harmonic shift or the start of a musical phrase (Lerdahl & Jackendoff, 1983). The interplay of different accent types is one of the sources of complexity in music (Jones, 1993), particularly the interplay of metrical accents with off-beat phenomenal or structural accents. For example, syncopation in music illustrates the successful use of phenomenal accents “against the grain” of the prevailing meter. This raises a key point about the musical metrical grid, namely that it is a mental pattern of multiple periodicities in the mind of a listener, and not simply a map of the accentual structure of a sequence. This point will become relevant in the discussion of metrical grids in language.
音乐节拍对行为、感知和大脑信号的影响已通过多种方式得到证实。Sloboda (1983) 让钢琴家演奏相同的音符序列并设置不同的拍号(在音乐中,拍号表示节拍的分组和重音模式,即节拍)。演奏的持续时间模式因节拍的不同而有很大差异,在许多情况下,一位特定的钢琴家甚至没有意识到他们正在用两种不同的节拍演奏相同的音符序列。Patel、Iversen 等人展示了仪表对同步的影响。(2005),他表明在同一节拍周期敲击一个韵律模式不同于敲击一个简单的节拍器。具体来说,点击每个度量周期的第一个节拍(即,声音示例 3.3 的强韵律序列中的“强拍”比其他节拍上的拍子更接近物理节拍。重要的是,这些强拍(每四拍出现一次)在强度和持续时间方面与其他音调相同,因此强拍对敲击的影响不是由于任何物理重音,而是由于它们在创建四拍周期结构中的作用听众的心。
The influence of musical meter on behavior, perception, and brain signals has been demonstrated in a number of ways. Sloboda (1983) had pianists perform the same sequence of notes set to different time signatures (in music, the time signature indicates the grouping and accentuation pattern of beats, i.e., the meter). The durational patterning of the performances differed substantially depending on the meter, and in many cases a given pianist did not even realize they were playing the same note sequence in two different meters. A demonstration of meter’s effect on synchronization comes from Patel, Iversen, et al. (2005), who showed that tapping to a metrical pattern differs from tapping to a simple metronome at the same beat period. Specifically, taps to the first beat of each metric cycle (i.e., the “downbeats” in the strongly metrical sequences of Sound Example 3.3) were closer to the physical beat than taps on other beats. Importantly, these downbeats (which occurred every four beats) were identical to other tones in terms intensity and duration, so that the influence of downbeats on tapping was not due to any physical accent but to their role in creating a four-beat periodic structure in the minds of listeners.
就节拍对感知的影响而言,Palmer 和 Krumhansl (1990) 让参与者听了一系列等时音调,并想象每个事件形成了两个、三个、四个或六个节拍组的第一拍。重复几次后,会响起试探音,参与者必须指出它与想象中的仪表的匹配程度。评级反映了节拍强度的等级(参见 Jongsma 等人,2004)。转向神经研究,Iversen、Repp 和 Patel(2009 年)让受过音乐训练的参与者聆听一个韵律模糊的重复双音模式,并在心理上在特定位置强加一个弱拍。具体来说,在一半的序列中,他们认为第一个音调是悲观的,而在另一半中,他们认为第二个音调是悲观的。参与者被指示不要移动或参与运动想象。使用听觉区域的大脑信号测量脑磁图 (MEG) 显示,当一个音符被解释为悲观时,与非悲观时相比(即使音调是在这两种情况下物理上相同)。3对照实验表明,活动增加的模式与所讨论的音符实际上是物理重音时观察到的模式非常相似(图 3.3)。这些结果表明,对仪表的感知涉及通过心理周期性时间-重音方案对输入信号进行主动整形。
In terms of meter’s influence on perception, Palmer and Krumhansl (1990) had participants listen to a sequence of isochronous tones and imagine that each event formed the first beat of groups of two, three, four or six beats. After a few repetitions, a probe tone was sounded and participants had to indicate how well it fit with the imagined meter. The ratings reflected a hierarchy of beat strength (cf. Jongsma et al., 2004). Turning to neural studies, Iversen, Repp, and Patel (2009) had musically trained participants listen to a metrically ambiguous repeating two-note pattern and mentally impose a downbeat in a particular place. Specifically, in half of the sequences they imagined that the first tone was the downbeat, and in the other half they imagined that the second tone was the downbeat. Participants were instructed not to move or to engage in motor imagery. Measurement of brain signals from auditory regions using magnetoencephalography (MEG) revealed that when a note was interpreted as the downbeat, it evoked an increased amount of neural activity in a particular frequency band (beta, 20-30 Hz) compared to when it was not a downbeat (even though the tones were physically identical in the two conditions).3 A control experiment showed that the pattern of increased activity closely resembled the pattern observed when the note in question was in fact physically accented (Figure 3.3). These results suggest that the perception of meter involves the active shaping of incoming signals by a mental periodic temporal-accentual scheme.
分组是指对边界的感知,边界之间的元素聚集在一起形成一个时间单元。这可以用 K0016 来说明。听这段旋律,有明显的分乐句的感觉,如图3.4所示。
Grouping refers to the perception of boundaries, with elements between boundaries clustering together to form a temporal unit. This can be illustrated with K0016. In listening to this melody, there is a clear sense that it is divided into phrases, schematically marked in Figure 3.4.
前两个乐句的感知边界以静默(休止符)为标志。更有趣的是第三和第四乐句末尾的边界,它们在音调序列中没有任何物理上的不连续性,但仍然是显着的感知断点。
The perceptual boundaries of the first two phrases are marked by silences (musical rests). Of greater interest are the boundaries at the end of the third and the fourth phrases, which are not marked by any physical discontinuity in the tone sequence, but are nevertheless salient perceptual break points.
正如 Lerdahl 和 Jackendoff (1983) 所强调的,编组不同于韵律,这两个节奏维度的相互作用在塑造音乐的节奏感方面起着重要作用。例如,anacrusis 或 upbeat 是一种节奏上显着的现象,涉及分组和节拍之间的轻微错位,换句话说,一个从弱节拍开始的乐句(例如 K0016 的乐句 2)。
As emphasized by Lerdahl and Jackendoff (1983), grouping is distinct from meter, and the interaction of these two rhythmic dimensions plays an important role in shaping the rhythmic feel of music. For example, anacrusis, or upbeat, is a rhythmically salient phenomenon involving a slight misalignment between grouping and meter, in other words, a phrase starting on a weak beat (such as phrase 2 of K0016).
音乐中感知分组的心理学证据来自许多来源。记忆实验表明,如果要求听者指出一个简短的音调序列是否嵌入了先前听到的较长音调序列中,则当摘录在原始序列的组边界处结束时的表现比跨越组边界时的表现更好(道林,1973 年;佩雷茨,1989 年)。这表明分组会影响记忆中声音的心理分块。分组的进一步证据来自表明分组如何扭曲时间感知的研究。例如,在音乐序列中靠近乐句边界的咔嗒声会在感知上迁移到这些边界,并且听上去与它们一致(Sloboda & Gregory,1980;Stoffer,1985)。基于分组的感知扭曲的更多证据来自 Repp (1992a) 的一项研究,在该研究中,参与者反复聆听计算机生成的贝多芬小步舞曲开头的同步版本。任务是检测音乐中 47 个可能位置中的 1 个是否延长。检测性能特别短语边界不佳,可能反映了在这些点上延长的期望(雷普进一步表明,这些是人类表演者通常放慢速度以标记短语结构的点)。最后,在一项熟悉旋律识别的门控研究中,Schulkind 等人连续听到较长的曲调片段,直到它们被正确识别。(2003) 发现识别性能在短语边界处最高。4因此,有大量证据表明分组在音乐感知中发挥作用。
Psychological evidence for perceptual grouping in music comes from a number of sources. Memory experiments show that if a listener is asked to indicate whether a brief tone sequence was embedded in a previously heard longer tone sequence, performance is better when the excerpt ends at a group boundary in the original sequence than when it straddles a group boundary (Dowling, 1973; Peretz, 1989). This suggests that grouping influences the mental chunking of sounds in memory. Further evidence for grouping comes from studies that show how grouping warps the perception of time. For example, clicks placed near phrase boundaries in musical sequences perceptually migrate to those boundaries and are heard as coinciding with them (Sloboda & Gregory, 1980; Stoffer, 1985). More evidence for perceptual warping based on grouping comes from a study by Repp (1992a), in which participants repeatedly listened to a computer-generated isochronous version of the opening of a Beethoven minuet. The task was to detect a lengthening in 1 of 47 possible positions in the music. Detection performance was particularly poor at phrase boundaries, probably reflecting an expectation for lengthening at these points (Repp further showed that these were the points at which human performers typically slowed down to mark the phrase structure). Finally, in a gating study of recognition for familiar melodies in which successively longer fragments of tunes were heard until they were correctly identified, Schulkind et al. (2003) found that identification performance was highest at phrase boundaries.4 Thus there is abundant evidence that grouping plays a role in musical perception.
图 3.3(A) 重复双音符节奏模式,其中听众想象第一音(左)或第二音(右)的强拍。(B) 对双音模式的诱发神经反应(在听觉大脑区域测量)以两种不同的方式主观解释,换句话说,音调 1 与音调 2 的悲观。(音调 1 和 2 的开始时间是在 0 和 0.2 秒处用细的垂直灰色线表示)。黑色实线和虚线显示了两种想象的节拍条件下的跨主题均值(实线 = 音调 1 上想象的节拍,虚线 = 音调 2 上想象的节拍)。数据来自 beta 频率范围 (20-30 Hz)。差异由虚线表示,阴影表示 1 个标准误差。(C) 在 beta 频率范围内对以两种不同方式物理重音的双音模式诱发神经反应,
Figure 3.3 (A) Repeating two-note rhythmic pattern, in which the listener imagines the downbeat on either the first tone (left) or second tone (right). (B) Evoked neural responses (measured over auditory brain regions) to the two-tone pattern subjectively interpreted in two different ways, in other words, with the downbeat on tone 1 versus tone 2. (The onset times of tones 1 and 2 are indicated by thin, vertical, gray lines at 0 and 0.2 s). The solid and dashed black lines show across-subject means for the two imagined beat conditions (solid = beat imagined on tone 1, dashed = beat imagined on tone 2). Data are from the beta frequency range (20-30 Hz). The difference is shown by the dotted line, with shading indicating 1 standard error. (C) Evoked neural responses in the beta frequency range to a two-tone pattern physically accented in two different ways, with the accent on tone 1 (solid line) versus tone 2 (dashed line).
图 3.4 K0016 被分割成旋律乐句(p1 = 乐句 1,等等)。
Figure 3.4 K0016 segmented into melodic phrases (p1 = phrase 1, etc.).
听众使用什么线索来推断音乐中的分组结构?再次回到 K0016,短语 3 和 4 的结尾以音高的局部持续时间延长和降低为标志。值得注意的是,已发现这些线索在语音中子句结尾的韵律标记中很重要 (Cooper & Sorensen, 1977)。即使是婴儿也对这些语言和音乐中的边界线索表现出敏感性(Hirsh-Pasek 等人,1987 年;Krumhansl 和 Jusczyk,1990 年;Jusczyk 和 Krumhansl,1993 年)。例如,婴儿更喜欢听在较长和较低的声音之后而不是在其他位置插入停顿的音乐序列,大概是因为在前一种情况下,停顿与感知边界一致。当然,分组不仅仅是这两个提示。Deliège (1987) 发现强度、持续时间、音调和音色的显着变化都可以在划分群体边缘方面发挥作用。另一个可能重要的因素是动机重复,例如,重复模式相同的整体持续时间和内部持续时间模式。当这些不同的线索发生冲突时,人们可能会不同意他们听到分组边界的地方 (Peretz, 1989)。分组感知中不同因素的相互作用是一个引起持续关注的话题,因为关于片段感知分割的数据相对容易收集(Clarke & Krumhansl, 1990; Deliège et al., 1996; Frankland & Cohen, 2004; Schaefer等人,2004 年)。
What cues do listeners use in inferring grouping structure in music? Returning again to K0016, the ends of phrases 3 and 4 are marked by local durational lengthening and lowering of pitch. It is notable that these cues have been found to be important in the prosodic marking of clause endings in speech (Cooper & Sorensen, 1977). Even infants show sensitivity to these boundary cues in both speech and music (Hirsh-Pasek et al., 1987; Krumhansl & Jusczyk, 1990; Jusczyk & Krumhansl, 1993). For example, infants prefer to listen to musical sequences in which pauses are inserted after longer and lower sounds rather than at other locations, presumably because in the former case the pauses coincide with perceptual boundaries. Of course, there is much more to grouping than just these two cues. Deliège (1987) found that salient changes in intensity, duration, pitch, and timbre can all play a role in demarcating the edges of groups. Another factor that is likely to be important is motivic repetition, for example, a repeating pattern of the same overall duration and internal durational patterning. When these different cues are in conflict, people can disagree about where they hear grouping boundaries (Peretz, 1989). The interaction of different factors in grouping perception is a topic that draws continuing interest, because data on the perceived segmentation of pieces is relatively easy to collect (Clarke & Krumhansl, 1990; Deliège et al., 1996; Frankland & Cohen, 2004; Schaefer et al., 2004).
分组在现代音乐认知理论中扮演着重要的角色,它被认为是等级制的,较低级别的组嵌套在较高级别的组中。例如,K0016 中分组的理论分析会在短语层的上方和下方添加层。在短语层之下,每个短语将被解析为更小的组(动机);在短语层之上,短语将链接到更高层次的结构中。例如,可以将乐句 1 和 2 组合成一个组,然后是一个由乐句 3 和 4 组成的组,最后一个与乐句 5 重合的组。Lerdahl 对音乐层次分组最发达的理论处理之一和杰肯道夫 (1983),他们对分组结构提出了某些基本约束,例如必须在每个层次级别将一块完全解析为组的约束,以及较高级别的边界必须与较低级别的边界重合。音乐中多层分组结构的证据来自 Todd (1985) 的研究,他表明音乐中给定乐句边界处的延长量是由该边界在乐曲的分层乐句结构中的位置预测的。
Grouping plays a prominent role in modern cognitive theories of music, in which it is conceived of as hierarchical, with lower level groups nested within higher level ones. For example, a theoretical analysis of grouping in K0016 would add layers above and below the phrase layer. Below the phrase layer, each phrase would be parsed into smaller groups (motives); above the phrase layer, phrases would be linked into higher level structures. For example, one might unite Phrases 1 and 2 into a group, followed by a group consisting of Phrases 3 and 4, and a final group coincident with Phrase 5. One of the most developed theoretical treatments of hierarchical grouping in music is that of Lerdahl and Jackendoff (1983), who propose certain basic constraints on grouping structure such as the constraint that a piece must be fully parsed into groups at each hierarchical level, and that boundaries at higher levels must coincide with those at lower levels. Evidence for multiple layers of grouping structure in music comes from research by Todd (1985), who showed that the amount of lengthening at a given phrase boundary in music is predicted by the position of that boundary in a hierarchical phrase structure of a piece.
音乐中分组结构的层次观点与现代语言学理论中的韵律结构理论非常相似,特别是“韵律层次”的概念(Selkirk,1981,Nespor&Vogel,1983)。韵律层级指的是语音中从音节到话语的多个级别的声波分组组织。所有这些理论提出的一个关键概念点是,这些分组并不是句法组织的简单反映。举一个众所周知的例子,考虑 3.1a 中句子的句法括号与其韵律短语括号 3.1b 之间的区别(Chomsky & Halle,1968):
The hierarchical view of grouping structure in music shows strong parallels to theories of prosodic structure in modern linguistic theory, notably the concept of the “prosodic hierarchy” (Selkirk, 1981, Nespor & Vogel, 1983). The prosodic hierarchy refers to the organization of sonic groupings at multiple levels in speech, ranging from the syllable up to the utterance. A key conceptual point made by all such theories is that these groupings are not simple reflections of syntactic organization. To take a well-known example, consider the difference between the syntactic bracketing of a sentence in 3.1a versus its prosodic phrasal bracketing 3.1b (Chomsky & Halle, 1968):
(3.1a) 这是[猫[抓到[老鼠[偷了[奶酪]]]]]
(3.1a) This is [the cat [that caught [the rat [that stole [the cheese]]]]]
(3.1b) [This is the cat] [that catched the rat] [that stone the cheese]
(3.1b) [This is the cat] [that caught the rat] [that stole the cheese]
韵律分组反映了一个独立的语音组织层次,它不是由句法结构直接决定的。相反,其他语言因素起着重要作用,例如单词之间的语义关系和将焦点放在某些元素上的愿望(Marcus & Hindle,1990;Ferreira,1991)。此外,人们认为还有纯粹的节奏因素,例如避免很短或很长的群体的倾向,以及平衡群体长度的倾向 (Gee & Grosjean, 1983; Zellner Keller, 2002)。句子的韵律分组结构绝不是固定的in stone:个体之间在如何对同一个句子的单词进行分组方面存在差异,并且句子的分组结构会随着语速而变化(Fougeron & Jun,1998)。然而,分组并不完全是特殊的,心理语言学家在基于句子的句法分析预测说话者在句子中放置韵律边界的位置方面取得了很好的进展(Watson & Gibson,2004)。
Prosodic grouping reflects a separate phonological level of organization that is not directly determined by syntactic structure. Instead, other linguistic factors play an important role, such as the semantic relations between words and the desire to place focus on certain elements (Marcus & Hindle, 1990; Ferreira, 1991). Furthermore, there are thought to be purely rhythmic factors such as a tendency to avoid groups that are very short or very long, and a tendency to balance the lengths of groups (Gee & Grosjean, 1983; Zellner Keller, 2002). The prosodic grouping structure of a sentence is by no means set in stone: There are differences among individuals in terms of how they group the words of the same sentence, and the grouping structure of a sentence can vary with speech rate (Fougeron & Jun, 1998). Nevertheless, grouping is not totally idiosyncratic, and psycholinguists have made good progress in predicting where speakers place prosodic boundaries in a sentence based on syntactic analyses of sentences (Watson & Gibson, 2004).
尽管上面的示例 3.1 仅显示了一层韵律分句,但韵律层次的现代理论假定多个层次相互嵌套。理论提出的层次数量各不相同(Shattuck-Hufnagel 和 Turk,1996 年),5因此出于说明目的,此处仅讨论一种此类理论。Hayes (1989) 提出了一个五级层次结构,包括单词、附和组、语音短语、语调短语和话语。图 3.5显示了根据该理论的句子的韵律层次结构,还显示了句法结构以供比较。(请注意,附加组将具有重读音节的词汇词与相邻的功能词(非重读音节)组合成一个韵律单元。有关其他单元的定义,请参见 Hayes, 1989。)
Although Example 3.1 above only shows one level of prosodic phrasing, modern theories of the prosodic hierarchy posit multiple levels nested inside one another. Theories vary in the number of levels they propose (Shattuck-Hufnagel & Turk, 1996),5 so for illustrative purposes only one such theory is discussed here. Hayes (1989) posits a five-level hierarchy comprised of words, clitic groups, phonological phrases, intonational phrases, and utterances. Figure 3.5 shows a prosodic hierarchy for a sentence according to this theory, with the syntactic structure also shown for comparison. (Note that a clitic group combines a lexical word that has a stressed syllable with an adjacent function word—an unstressed syllable—into a single prosodic unit. See Hayes, 1989, for definitions of other units.)
为韵律层次结构中给定层次的存在提供的一种形式的证据是音素片段实现的系统变化,这取决于该层次的韵律结构。例如,Hayes (1989) 讨论了英语演讲中的 /v/ 删除,作为在附言组内运作的规则的一个例子。因此,在说“Will you [save me] a seat?”时,删除美式英语中的 /v/ 是可以接受的。因为“救救我”是一个团体。(也就是说,如果你仔细听一个说美国英语的人快速说出这个短语,“save”通常在听觉上被理解为“say”,尽管它的本意是——而且听上去是——“save”。)相比之下,/ v/ 在说“[save] [mom]”时不会被删除,因为 [save] 和 [mom] 是两个独立的 clitic 组。
One form of evidence offered for the existence of a given level in the prosodic hierarchy is a systematic variation in the realization of a phonemic segment that depends on prosodic structure at that level. For example, Hayes (1989) discusses /v/ deletion in English speech as an example of a rule that operates within the clitic group. Thus it is acceptable to delete the /v/ in American English when saying, “Will you [save me] a seat?” because “save me” is a clitic group. (That is, if you listen carefully to an American English speaker say this phrase rapidly, “save” is often acoustically realized as “say,” though it is intended as—and heard as—“save”.) In contrast, the /v/ is not deleted when saying “[save] [mom]” because [save] and [mom] are two separate clitic groups. Other evidence that has been adduced for prosodic constituents includes preferences for interruption points between, rather than within, constituents (Pilon, 1981), and speeded word spotting at the boundaries of constituents (Kim, 2003).
韵律层次结构具有多个级别的证据来自语音的语音修改,这些修改以参数方式随着音素位置处的韵律边界高度而变化(这对应于该点重合的韵律边界的数量,因为更高级别的边界是总是与较低级别的一致)。例如,Cho 和 Keating (2001) 表明,在韩语中,塞音的语音起始时间在较高的韵律边界处较长,而 Dilley、Shattuck-Hufnagel 和 Ostendorf (1996) 表明,单词的声门化程度- 起始元音在较高级别边界处更大。
Evidence that the prosodic hierarchy has multiple levels comes from phonetic modifications of speech that vary in a parametric fashion with the height of the prosodic boundary at the phoneme’s location (this corresponds to the number of coincident prosodic boundaries at that point, as higher level boundaries are always coincident with lower level ones). For example, Cho and Keating (2001) showed that in Korean, the voice-onset time of stop consonants is larger at higher prosodic boundaries, and Dilley, Shattuck-Hufnagel, and Ostendorf (1996) have shown that the amount of glottalization of word-onset vowels is greater at higher level boundaries.
图 3.5 (A) 句法和 (B) 英语句子的韵律层级。(A) 的缩写:S = 句子,PP = 介词短语,NP = 名词短语,VP = 动词短语,Det = 限定词,A = 形容词,N = 名词,V = 动词。(B) 的缩写:U = 话语,I = 语调短语,P = 音韵短语,C = clitic 组,W = 词。改编自海耶斯,1989 年。
Figure 3.5 (A) Syntactic and (B) prosodic hierarchy for a sentence of English. Abbreviations for (A): S = sentence, PP = prepositional phrase, NP = noun phrase, VP = verb phrase, Det = Determiner, A = adjective, N = Noun, V = Verb. Abbreviations for (B): U = utterance, I = Intonation phrase, P = phonological phrase, C = clitic group, W = word. Adapted from Hayes, 1989.
另一个支持语音层次分组概念的现象是单词之间感知连接的变化(Jun,2003)。在连接的语音中,单词在声学上是连在一起的,并且确实出现的无声间隔(例如,由于辅音停止)不一定位于单词边界处(参见第 2 章,第 2.3.3 节),“频谱图简介”小节)。然而,词被认为是彼此分离的。然而,与书面语言不同的是,每对单词之间的感知间距程度并不相同:相反,一些单词边界似乎比其他单词边界更强。例如,下面 3.1c 中的句子包含来自 Price 等人设计的系统的连接标记。(1991)。在这个系统中,研究人员反复聆听给定的句子,并在每对单词之间放置 0 到 6 的数字,以表示它们之间的感知分离程度。“中断指数”为 0 表示感知到的最弱接合点,换句话说,在一个支持团体的词之间。在相反的极端,中断指数为 6 表示句子结束。
Another phenomenon that supports the notion of hierarchical grouping in speech is variation in perceived juncture between words (Jun, 2003). In connected speech, words are acoustically run together and the silent intervals that do occur (e.g., due to stop consonants) are not necessarily located at word boundaries (cf. Chapter 2, section 2.3.3, subsection “A Brief Introduction to the Spectrogram”). Nevertheless, words are perceived as separated from one another. Unlike with written language, however, the perceived degree of spacing is not identical between each pair of words: Rather, some word boundaries seem stronger than others. For example, the sentence in 3.1c below contains juncture markings from a system devised by Price et al. (1991). In this system, a researcher listens repeatedly to a given sentence and places numerals from 0 to 6 between each pair of words to indicate the degree of perceived separation between them. A “break index” of 0 indicates the weakest perceived juncture, in other words, between the words of a clitic group. At the opposite extreme, a break index of 6 indicates the end of a sentence.
(3.1c) 只有 1 one 4 记得 3 the 0 lady 1 in 1 red 6。
(3.1c) Only 1 one 4 remembered 3 the 0 lady 1 in 1 red 6.
怀特曼等。(1992) 研究了大型语音语料库中这些中断指数与语音持续时间模式之间的关系,并发现感知边界强度与边界前音节的延长量之间存在相关性(参见 Gussenhoven 和 Rietveld,1992)。6这一发现是惊人地让人想起上述 Todd (1985) 对音乐的研究。另一个与音乐相似的是,持续时间的延长与音高和振幅线索相互作用,决定了韵律边界的感知强度 (Streeter, 1978; de Pijper & Sanderman, 1994)。
Wightman et al. (1992) studied the relationship between these break indices and speech duration patterns in a large speech corpus, and found a correlation between perceived boundary strength and amount of lengthening of the syllable preceding the boundary (cf. Gussenhoven & Rietveld, 1992).6 This finding is strikingly reminiscent of the research on music by Todd (1985) described above. Another parallel to music is that durational lengthening interacts with pitch and amplitude cues in determining the perceived strength of prosodic boundaries (Streeter, 1978; de Pijper & Sanderman, 1994).
总之,分组是一种基本的节奏现象,适用于音乐和语言序列。在这两个领域中,大脑将复杂的声学模式解析为多层次的短语结构,音乐和语言共享许多用于标记短语边界的声学线索。这些相似之处表明这两个领域的分组具有共同的认知过程,并表明分组可能是比较研究的一个富有成果的领域。如第 3.5 节所述,实证工作正在证明这些直觉是正确的。
In conclusion, grouping is a fundamental rhythmic phenomenon that applies to both musical and linguistic sequences. In both domains, the mind parses complex acoustic patterns into multiple levels of phrasal structure, and music and language share a number of acoustic cues for marking phrase boundaries. These similarities point to shared cognitive process for grouping across the two domains, and indicate that grouping may prove a fruitful area for comparative research. As discussed in section 3.5, empirical work is proving these intuitions correct.
到目前为止,关于音乐节奏的讨论一直关注时间点和边缘:节拍和分组边界。节奏研究中的一组不同问题涉及时间如何被填满,换句话说,事件的持续模式。
Up to this point, the discussion of musical rhythm has been concerned with points and edges in time: beats and grouping boundaries. A different set of issues in rhythm research concerns how time gets filled, in other words, the durational patterning of events.
在音乐中,事件的持续模式通常通过特定事件流中事件开始之间的时间间隔来衡量:这定义了一系列间隔时间间隔 (IOI)。例如,旋律音调之间的 IOI 序列定义了该旋律的持续模式。通常,持续时间倾向于围绕某些值聚集,反映音乐中时间组织为离散类别。Fraisse (1982) 指出在西方音乐序列中突出的两个类别是 200-300 毫秒的短时间和 450-900 毫秒的长时间(参见 Ross,1989)。他认为这两个持续时间类别不仅在数量上不同,而且在感知属性方面也不同:长间隔被视为具有不同持续时间的独立单位,
In music, the durational patterning of events is typically measured by the time intervals between event onsets within a particular event stream: This defines a sequence of interonset intervals (IOIs). For example, the sequence of IOIs between the tones of a melody defines the durational patterning of that melody. Typically, durations tend to be clustered around certain values reflecting the organization of time in music into discrete categories. Fraisse (1982) pointed out that two categories that figure prominently in Western musical sequences are short times of 200-300 ms and long times of 450-900 ms (cf. Ross, 1989). He argued that these two duration categories were not only quantitatively different but also different in terms of their perceptual properties: Long intervals are perceived as individual units with distinct durations, whereas short intervals are perceived collectively in terms of their grouping patterns rather than in terms of individual durations.
有经验证据表明,音乐节奏的持续时间是根据类别来感知的(Clarke,1987;Schulze,1989)。例如,Clarke (1987) 让音乐专业的学生对节奏进行分类感知实验。参与者听到短的音调序列,其中最后两个音调的比例在 1:1 和 1:2 之间变化。听众必须将最终持续时间比率确定为其中之一,并且还必须完成一项要求他们区分不同比率的任务。结果显示识别功能急剧转变,并且当刺激靠近边界而不是在给定区域内时,辨别力会增加。(克拉克还发现边界的位置取决于感知序列的格律环境,提供了格律对感知影响的另一个例子;比照。第 3.2.2 节。)
There is empirical evidence that durations in musical rhythms are perceived in terms of categories (Clarke, 1987; Schulze, 1989). For example, Clarke (1987) had music students perform a categorical perception experiment on rhythm. Participants heard short sequences of tones in which the last two tones had a ratio that varied between 1:1 and 1:2. Listeners had to identify the final duration ratio as one or the other of these, and also had to complete a task that required them to discriminate between different ratios. The results showed a steep transition in the identification function, and increased discrimination when stimuli were near the boundary versus within a given region. (Clarke also found that the location of the boundary depended on the metrical context in which the sequence was perceived, providing another example of the influence of meter on perception; cf. section 3.2.2.)
在语音中,基本语言元素(例如音素和音节)的持续时间受到许多因素的影响。例如,对于不同声音的产生速度有发音限制,这会为不同的声音创建不同的最短持续时间 (Klatt, 1979)。也有一些系统的语音因素使一些声音比其他声音更长。例如,在英语中,如果同一个元音出现在浊音而非清音的末尾塞音之前,它往往会更长(例如,“bead”与“beet”中的 /i/),这种差异会影响最终停止的感知是浊音还是清音 (Klatt, 1976)。影响音节持续时间的一个简单的语音因素是音节中的音素数量:音素较多的音节往往比音素较少的音节更长(例如,“飞溅”与“腰带”;威廉姆斯和希勒,1994 年)。在这些变化来源之上还有其他来源,包括说话风格的变化(随意与清晰),以及与话语因素相关的语速变化,例如在句子结束时加快速度以“保持发言权”( Schegloff,1982 年;Smiljanic 和 Bradlow,2005 年)。鉴于所有这些因素,语音元素的持续时间不会倾向于聚集在离散值周围也就不足为奇了。相反,音节或音素持续时间的测量通常会揭示具有一个主峰的连续分布。例如,例如在对话中“保持发言”的句子末尾加快速度(Schegloff,1982 年;Smiljanic 和 Bradlow,2005 年)。鉴于所有这些因素,语音元素的持续时间不会倾向于聚集在离散值周围也就不足为奇了。相反,音节或音素持续时间的测量通常会揭示具有一个主峰的连续分布。例如,例如在对话中“保持发言”的句子末尾加快速度(Schegloff,1982 年;Smiljanic 和 Bradlow,2005 年)。鉴于所有这些因素,语音元素的持续时间不会倾向于聚集在离散值周围也就不足为奇了。相反,音节或音素持续时间的测量通常会揭示具有一个主峰的连续分布。例如,图 3.6a显示了英语口语样本的音节持续时间直方图。
In speech, the duration of basic linguistic elements (such as phonemes and syllables) is influenced by a number of factors. For example, there are articulatory constraints on how fast different sounds can be produced, which creates different minimum durations for different sounds (Klatt, 1979). There are also systematic phonological factors that make some sounds longer than others. For example, in English, the same vowel tends to be longer if it occurs before a final stop consonant that is voiced rather than unvoiced (e.g., the /i/ in “bead” vs. “beet”), and this difference influences the perception of the final stop as voiced or voiceless (Klatt, 1976). A simple phonological factor that influences syllable duration is the number of phonemes in the syllable: Syllables with more phonemes tend to be longer than those with fewer phonemes (e.g., “splash” vs. “sash”; Williams & Hiller, 1994). Atop these sources of variation are other sources including variations in speaking style (casual vs. clear), and variations in speech rate related to discourse factors, such as speeding up near the end of a sentence to “hold the floor” in a conversation (Schegloff, 1982; Smiljanic & Bradlow, 2005). Given all these factors, it is not surprising that the durations of speech elements do not tend to cluster around discrete values. Instead, measurements of syllable or phoneme duration typically reveal a continuous distribution with one main peak. For example, Figure 3.6a shows a histogram of syllable durations for a sample of spoken English.
图 3.6a美式英语自发语音语料库中音节持续时间的直方图。数据来自大约 16,000 个音节。平均音节持续时间 = 191 毫秒,sd = 125 毫秒。持续时间 >750 毫秒的音节未显示(< 总数的 1%)。直方图 bin 大小 = 10 毫秒。分析基于 Greenberg,1996 年的数据。
Figure 3.6a Histogram of syllable durations in a corpus of spontaneous speech in American English. Data are from approximately 16,000 syllables. Mean syllable duration = 191 ms, sd = 125 ms. Syllables with duration >750 ms are not shown (<1% of total). Histogram bin size = 10 ms. Analysis based on data from Greenberg, 1996.
话虽如此,重要的是要注意持续类别确实出现在某些语言中。例如,有些语言具有音位长度对比,其中当使用相同元音或辅音的短版和长版时,同一个词可能表示完全不同的意思。在某些语言中,例如爱沙尼亚语,甚至可以有三向长度对比。例如,根据第一个 /a/ 的长度,“sata”可以表示三个完全不同的事物(“hundred”、“send”和“get”)。研究给定元音音素的长度对比并检查连接语音中每个持续时间类别内的时间变化量会很有趣。这可以与音乐中给定持续时间类别的时间变化进行比较,7
Having said this, it is important to note that durational categories do occur in some languages. For example, there are languages with phonemic length contrasts in which the same word can mean entirely different things when a short versus long version of the same vowel or consonant is used. In some languages, such as Estonian, there can even be three-way length contrasts. For example, “sata” can mean three entirely different things (“hundred,” “send,” and “get”) depending on the length of the first /a/. It would be interesting to study length contrasts in a given vowel phoneme and examine the amount of temporal variability within each duration category in connected speech. This could be compared to temporal variability of a given duration category in music, to see whether the perceptual system has a similar tolerance for within-category variability in the two domains.7
如果感知系统只关心作为一系列离散类别的音乐持续时间,那么基于音乐符号的精确呈现的音乐作品的计算机演绎将完全被听众接受。尽管这种机械表演确实出现在某些环境中(例如,某些现代流行音乐中的节奏轨道),但在其他情况下,例如古典钢琴曲目,这种表演被认为是非音乐性的而被拒绝。因此,毫不奇怪,人类表现的物理测量显示与记录的持续时间有很大的偏差。例如,图 3.6b显示了 IOI 的直方图,所有这些都表示具有相同记号持续时间的音符的实现(八分音符或八分音符)来自著名钢琴家对舒曼的 Träumerei 的演绎(Repp,1992b)。8个如果这首曲子是由机器演奏的,那么所有这些 IOI 都将是一个单一的值。相反,可以看到相当大的变化。关于这种变化的关键事实是它不是“噪音”:它主要代表与表演者对乐曲的解释相关的结构化变化(Palmer,1997;Ashley,2002)。例如,Repp (1992b) 研究了几位著名钢琴家对 Träumerei 的渲染,发现所有这些都在结构边界处显示出速度减慢,减慢的量与边界的重要性成正比(参见 Todd,1985)。在更精细的时间尺度上,Repp 发现在单个旋律乐句中,有一种在开始时加速并在接近结束时减慢的趋势,IOI 的模式遵循平滑的抛物线函数。Repp 推测这种模式可能反映了人类运动的原理,
If the perceptual system cared only about musical durations as a sequence of discrete categories, then computer renditions of musical pieces based on exact renderings of music notation would be perfectly acceptable to listeners. Although such mechanical performances do occur in some settings (e.g., rhythm tracks in some modern popular music), in other contexts, such as the classical piano repertoire, such performances are rejected as unmusical. Not surprisingly then, physical measurements of human performances reveal considerable deviations from notated durations. For example, Figure 3.6b shows a histogram of IOIs, all of which represent realizations of notes with the same notated duration (an eighth note or quaver) from a famous pianist’s rendition of Schumann’s Träumerei (Repp, 1992b).8 Had the piece been performed by a machine, all of these IOIs would be a single value. Instead, considerable variation is seen. The crucial fact about this variation is that it is not “noise”: It largely represents structured variation related to the performer’s interpretation of the piece (Palmer, 1997; Ashley, 2002). For example, Repp (1992b) studied several famous pianists’ renderings of Träumerei and found that all showed slowing of tempo at structural boundaries, with the amount of slowing proportional to the importance of the boundary (cf. Todd, 1985). At a finer timescale, Repp found that within individual melodic phrases there was a tendency to accelerate at the beginning and slow near the end, with the pattern of IOIs following a smooth parabolic function. Repp speculated that this pattern may reflect principles of human locomotion, in other words, a musical allusion to physical movement (cf. Kronman & Sundberg, 1987).
图3.6b 八分音符持续时间的直方图,来自克劳迪奥·阿劳 (Claudio Arrau) 演奏的舒曼 (Schumann) 的《创伤》(Traumerei)。直方图右尾的大值是由于短语结尾的 ritards。数据来自大约 170 个八分音符。平均音符持续时间 = 652 毫秒,sd = 227 毫秒。持续时间 > 1,600 毫秒的音符未显示(< 总数的 1%)。直方图 bin 大小 = 50 毫秒。
Figure 3.6b Histogram of durations of eighth notes from a performance of Schumann’s Traumerei by Claudio Arrau. The large values in the right tail of the histogram are due to phrase-final ritards. Data are from approximately 170 eighth notes. Mean note duration = 652 ms, sd = 227 ms. Notes with duration > 1,600 ms are not shown (<1% of total). Histogram bin size = 50 ms.
上面的段落着重于 IOI 在表达时间上的作用。IOI 是“表达时间配置文件”的基础,时间序列显示了事件时间的实际模式与基于标记持续时间的理想化模式。尽管对这些概况的研究主导了表达时间的研究,但重要的是不要忽视表达时间的另一个方面,即发音。虽然 IOI 指的是连续音调开始之间的时间间隔,但清晰度指的是偏移之间的时间一个音调和下一个音调的开始。如果这些事件之间的时间很短(或者如果音调重叠,使得前一个音调的偏移发生在后续音调开始之后,这在钢琴音乐中是可能的),这被认为是“连奏”清晰度。在这种类型的发音中,一个音调流畅地流入下一个音调。相比之下,断奏清晰度涉及偏移和起始之间的显着间隙,使音调具有节奏感。除了 IOI 和清晰度模式之外,音乐表达的另一个重要线索是音调强度的模式。
The above paragraph focuses on the role of IOIs in expressive timing. IOIs are the basis of “expressive timing profiles,” time series that show the actual pattern of event timing versus the idealized pattern based on notated duration. Although studies of these profiles have dominated research on expressive timing, it is important not to overlook another aspect of expressive timing, namely, articulation. Although IOI refers to the time interval between the onsets of successive tones, articulation refers to the time between the offset of one tone and the onset of the next. If there is little time between these events (or if the tones overlap so that the offset of the prior tone occurs after the onset of the following tone, which is possible in piano music), this is considered “legato” articulation. In this type of articulation, one tone is heard as flowing smoothly into the next. In contrast, staccato articulation involves a salient gap between offset and onset, giving the tones a rhythmically punctuated feel. In addition to IOI and articulation patterns, another important cue to musical expression is the patterning of tone intensity.
由于可以使用现代技术(例如,使用带有数字接口的钢琴,如 Yamaha Disklavier)非常精确地测量音乐的时间、清晰度和强度,因此表达一直是音乐制作研究中富有成果的研究领域. 也有一些研究关于知觉的表现特征。例如,听众可以可靠地将同一音乐的表演识别为表现力强、面无表情(机械)或夸张(Kendall & Carterette,1990),并且可以根据表现力特征识别表演者的意图情绪(Gabrielsson & Juslin,1996) .
Due to the fact that timing, articulation and intensity in music can be measured with great precision using modern technology (e.g., using pianos with digital interfaces, such as the Yamaha Disklavier), expression has been a fruitful area of research in studies of music production. There has also been some research on expressive features in perception. For example, listeners can reliably identify performances of the same music as expressive, deadpan (mechanical), or exaggerated (Kendall & Carterette, 1990), and can identify the performer’s intended emotion on the basis of expressive features (Gabrielsson & Juslin, 1996).
Palmer (1996) 表明,受过音乐训练的听众可以根据表达线索识别表演者预期的韵律和乐句结构。Clarke (1993) 提供了一个关于表达时间的感知重要性的巧妙演示,他使用了自然演奏的短旋律。对于每首旋律,克拉克都提取了其富有表现力的时序曲线,对其进行了处理,然后将其重新应用于旋律的机械演奏,从而创造出结构和表达不匹配的科学怪人旋律。例如,在一种情况下,原始的逐个音符表达的时序曲线向右移动了几个音符。音乐家根据演奏质量来判断原作与不匹配的旋律,并偏爱原作。
Palmer (1996) has shown that musically trained listeners can identify a performer’s intended metrical and phrase structure on the basis of expressive cues. One clever demonstration of the perceptual importance of expressive timing was provided by Clarke (1993), who used naturally performed short melodies. For each melody, Clarke extracted its expressive timing profile, manipulated it, and then reimposed it on a mechanical performance of the melody, thus creating a Frankensteinian melody with structure and expression mismatched. For example, in one condition the original note-by-note expressive timing profile was shifted several notes to the right. Musicians judged the originals versus the mismatched melodies in terms of the quality of performance, and favored the originals. Thus listeners are sensitive to the way expressive timing aligns with the structure of musical passages.
音乐中的表达时间与语音中的韵律结构有着有趣的关系。正如不同表演者演奏的一段音乐会有不同的表达时间模式一样,不同说话者说的同一句话也会有不同的音节和音素时间模式。过去,研究人员认为,表演的这些个人主义方面在音乐和口语序列的记忆中被“标准化”,认为抽象记忆表示有利于不太详细、更分类的结构(Large 等人,1995 年;Pisoni, 1997)。然而,最近的研究表明,听众会在记忆中保留一些语音和音乐的时间信息(Bradlow 等人,1999 年;Palmer 等人,2001 年)。例如,帕默等人。(2001) 让听众熟悉短旋律序列的特定表演,然后测试识别这些表演与相同序列的其他表演的能力。不同的演奏是由一位钢琴家产生的,他产生了相同的短旋律序列作为较长旋律的一部分,但它们的韵律结构不同(3/4 与 4/4 时间)。由于不同的韵律结构,相同的旋律序列产生了不同的清晰度和强度模式。对于每个这样的旋律序列,音乐家和非音乐家都能够辨别出他们听到的原始版本与另一个版本。此外,即使是 10 个月大的婴儿也会区分熟悉和不熟悉的表演,更倾向于前者。帕默等。
Expressive timing in music has an interesting relationship to prosodic structure in speech. Just as a musical passage played by different performers will have different expressive timing patterns, the same sentence spoken by different speakers will have a different temporal patterning of syllables and phonemes. In the past, researchers have suggested that these individualistic aspects of performance are “normalized away” in memory for musical and spoken sequences, arguing that the abstract memory representation favors a less detailed, more categorical structure (Large et al., 1995; Pisoni, 1997). More recent research, however, suggests that listeners retain some temporal information in memory for speech and music (Bradlow et al., 1999; Palmer et al., 2001). For example, Palmer et al. (2001) familiarized listeners with particular performances of short melodic sequences, and then later tested the ability to recognize these performances against other performances of the same sequences. The different performances were generated by a pianist who produced the same short melodic sequences as part of longer melodies that differed in their metrical structure (3/4 vs. 4/4 time). As a result of the differing metrical structure, the same melodic sequence was produced with different patterns of articulation and intensity. For each such melodic sequence, both musicians and nonmusicians were able to recognize the original version they had heard when presented with it versus another version. Furthermore, even 10-month-old infants discriminated between familiar and unfamiliar performances, orienting longer toward the former. Palmer et al. relate these findings to research in speech perception showing that listeners retain stimulus-specific acoustic properties of words along with abstract linguistic properties (Luce & Lyons, 1998).
将音乐中的时间与语音韵律相关的另一项研究涉及“节奏持续性”。容格斯等。(2002) 让钢琴家在听短旋律和视奏不同的短旋律之间交替。这参与者被告知要注意听到的和演奏的旋律,以进行稍后的记忆测试。事实上,真正感兴趣的问题是听到的旋律和演奏的旋律之间的关系。听到的旋律以慢节奏和快节奏的形式出现,Jungers 等人。发现演奏旋律的节奏受听到旋律节奏的影响:钢琴家在慢旋律后弹得更慢,在快速旋律后弹得更快。使用口头句子而不是旋律的类似实验显示了语音中类似的速度持续效应。这些发现让人想起社会语言学中关于“适应”的研究,该研究表明当不同社会背景的人相遇时,他们的语言会变得更加相似(cf. Giles et al., 1991)。
Another line of research relating timing in music to speech prosody concerns “tempo persistence.” Jungers et al. (2002) had pianists alternate between listening to short melodies and sight-reading different short melodies. The participants were told to attend to both the heard and performed melodies for a later memory test. In fact, the real question of interest was the relationship between the tempo of the heard and performed melodies. The heard melodies occurred in blocks of slow and fast tempi, and Jungers et al. found that the tempo of performed melodies was influenced by the tempo of heard melodies: The pianists played more slowly after slow melodies and faster after fast melodies. A similar experiment using spoken sentences rather than melodies showed a similar tempo persistence effect in speech. These findings are reminiscent of research on “accommodation” in sociolinguistics, which has shown that when people of different social backgrounds meet, their speech becomes more alike (cf. Giles et al., 1991).
在一项有趣的后续研究中,Dalla Bella 等人。(2003) 研究了跨模式的节奏持久性。听众(包括音乐家和非音乐家)在聆听旋律和大声朗读句子之间交替。音乐家表现出节奏持续效应:他们在听到更快的旋律后说话更快。然而,非音乐家没有表现出这种效果。此外,当音乐家进行反向实验时(他们在听句子和视奏旋律之间交替),没有证据表明节奏持续存在。达拉贝拉等人。表明音乐家在音乐节拍提取方面可能比非音乐家更好,这可能会推动他们在第一项研究中看到的节奏持久性的影响。按照这个逻辑,
In an interesting follow-up study, Dalla Bella et al. (2003) studied tempo persistence across modalities. Listeners (both musicians and nonmusicians) alternated between hearing melodies and reading sentences aloud. The musicians showed a tempo persistence effect: They spoke faster after hearing faster melodies. However, the nonmusicians showed no such effect. Furthermore, when the musicians did the reverse experiment (in which they alternated between hearing sentences and sight-reading melodies), there was no evidence of tempo persistence. Dalla Bella et al. suggest that musicians may be better than non-musicians at beat extraction in music, and that this may drive the effect of tempo persistence seen in their first study. Following this logic, I would suggest that the lack of an effect in their second study indicates that speech perception does not involve extraction of a beat.
节拍、节拍、重音、编组和表达时间的相互作用使音乐节奏成为一种心理上丰富的现象(这恰好在西欧音乐的范围内!)。Gabrielsson 的工作提出了这种丰富性的一些想法,他进行了研究,在这些研究中,听众使用相似性判断和形容词评级对各种节奏进行比较和分类(Gabrielsson,1993 年回顾)。多维尺度和因子分析等统计技术用于揭示音乐节奏体验中涉及的感知维度。这项研究揭示了数量惊人的维度 (15),这些维度广泛地分为与结构(例如,韵律、简单与复杂)、运动(例如,摇摆、优雅)和情感(例如,庄严与庄严)有关的维度。嬉戏)。尽管很多音乐节奏的认知科学都集中在结构问题上,但重要的是要牢记运动和情感之间的联系,因为这些联系是区分音乐节奏和语言节奏的一部分,这一点我将在演讲节奏的讨论结束。
The interactions of beat, meter, accent, grouping, and expressive timing make musical rhythm a psychologically rich phenomenon (and this is just within the confines of Western European music!). Some idea of this richness is suggested by the work of Gabrielsson, who has conducted studies in which a variety of rhythms are compared and classified by listeners using similarity judgments and adjective ratings (reviewed in Gabrielsson, 1993). Statistical techniques such as multidimensional scaling and factor analysis are used to uncover the perceptual dimensions involved in the experience of musical rhythms. This research has revealed an astonishingly large number of dimensions (15), which group broadly into those concerned with structure (e.g., meter, simplicity vs. complexity), motion (e.g., swinging, graceful), and emotion (e.g., solemnity vs. playfulness). Although much of the cognitive science of musical rhythm focuses on structural issues, it is important to keep the links to motion and emotion in mind, for these connections are part of what distinguishes musical rhythm from speech rhythm, a point to which I will return at the end of the discussion of rhythm in speech.
尽管对诗歌节奏的研究历史悠久,可以追溯到古希腊和印度文本,但对日常语言节奏的研究是语言学中相对较新的尝试。研究人员至少采用了三种方法来解决这个问题。第一种方法是类型学的,旨在了解人类语言之间的韵律异同。这项工作背后的驱动力一直认为语言节奏属于不同的类别。例如,在一个广泛使用的类型学方案(在下一节中讨论)中,英语、阿拉伯语和泰语都是一个节奏类(“重音时间语言”)的成员,而法语、印地语和约鲁巴语是一个节奏类的成员不同的类别(“音节计时语言”)。从这个例子可以明显看出,节奏类的成员资格不是由语言的历史关系决定的;韵律可以将历史上和地理上相距甚远的语言统一起来。
Although the study of rhythm in poetry has a long history, dating back to ancient Greek and Indian texts, the study of rhythm in ordinary language is a relatively recent endeavor in linguistics. Researchers have taken at least three approaches to this topic. The first approach is typological, and seeks to understand the rhythmic similarities and differences among human languages. The driving force behind this work has been idea that linguistic rhythms fall into distinct categories. For example, in one widespread typological scheme (discussed in the next section), English, Arabic, and Thai are all members of a single rhythmic class (“stress-timed languages”), whereas French, Hindi, and Yoruba are members of a different class (“syllable-timed languages”). As is evident from this example, membership in a rhythmic class is not determined by the historical relationship of languages; rhythm can unite languages that are otherwise quite distant both historically and geographically.
第二种研究语音节奏的方法是理论性的,旨在揭示在一种或多种给定语言中支配单词和话语节奏形状的原则。这项研究包括一个称为“韵律音位学”的领域,旨在通过使用形式化规则和表示来推导出观察到的话语节奏模式,从而使语言节奏研究与现代语言学的其余部分保持一致。
The second approach to speech rhythm is theoretical, and seeks to uncover the principles that govern the rhythmic shape of words and utterances in a given language or languages. This research, which includes an area called “metrical phonology,” seeks to bring the study of the linguistic rhythm in line with the rest of modern linguistics by using formalized rules and representations to derive the observed rhythmic patterning of utterances.
第三种方法是感知的,检查节奏在普通语音感知中的作用。该领域的一个突出研究方向是从连接的语音中感知单词的分割。另一项规模较小的研究考察了节奏可预测性对言语感知的影响。
The third approach is perceptual, and examines the role that rhythm plays in the perception of ordinary speech. One prominent line of research in this area concerns the perceptual segmentation of words from connected speech. Another, smaller line of research examines the effects of rhythmic predictability in speech perception.
本章这一部分的目的是介绍这些领域中的每一个,并在适当的时候与音乐节奏进行比较。在开始之前,值得介绍一个出现在每个部分中的概念:语音突出的概念。在许多语言中,产生具有不同程度突出的话语的音节是正常的。即使在没有特别强调任何特定单词的句子中也是如此。例如,在说下面的句子时,请注意标有 x 的音节比它们的相邻音节更突出:
The goal of this part of the chapter is to introduce each of these areas and make comparisons to musical rhythm when appropriate. Before commencing, it is worth introducing a concept that occurs in each section: the notion of prominence in speech. In many languages, it is normal to produce the syllables of an utterance with differing degrees of prominence. This is true even when a sentence is said with no special emphasis on any particular word. For example, when speaking the following sentence, note how the syllables marked by an x are more prominent than their neighbors:
突出的最重要的物理相关因素是持续时间、音调移动、元音质量和响度。9讲话突出会引起许多有趣的问题。听众可以可靠地区分多少种不同的突出程度(Shattuck-Hufnagel 和 Turk,1996)?语言在产生和感知过程中依赖特定声学线索的程度是否不同(Berinstein,1979;Lehiste & Fox,1992)?关于这些问题的实证数据还比较稀少,我们这里不去深究。取而代之的是,下面的大部分部分都将突出度视为一个二进制量,称为“压力”,遵循许多关于语音节奏的工作的传统。3.3.2 节中出现了一个例外,其中在现代语言节奏理论的背景下讨论了突出程度。
The most important physical correlates of prominence are duration, pitch movement, vowel quality, and loudness.9 Prominence in speech raises many interesting questions. How many different degrees of prominence can listeners reliably distinguish (Shattuck-Hufnagel & Turk, 1996)? Do languages differ in the extent to which they rely on particular acoustic cues to prominence in production and perception (Berinstein, 1979; Lehiste & Fox, 1992)? Empirical data on these issues is still relatively sparse, and we will not delve into them here. Instead, most sections below treat prominence as a binary quantity referred to as “stress,” following the tradition of much work on speech rhythm. An exception occurs in section 3.3.2, where degrees of prominence are discussed in the context of modern linguistic theories of speech rhythm.
在开始以下部分之前,应该先谈谈语言学中的重音概念。重音被认为是人类语言中单词韵律的一个方面;声调和词汇重音是另外两个方面。正如并非所有语言都有词汇声调(参见第 2 章对声调语言的讨论)或词汇音调重音,10并非所有语言都有词汇重音,换句话说,系统地标记单词中的某些音节比其他音节更突出。重要的是,词韵律的这三个方面并不相互排斥。例如,有重音语言(如普通话)和无重音语言(如广东话),以及有重音或无重音的重音语言(如分别为瑞典语和日语;2005 年 6 月)。因此,在下面的讨论中,应该牢记重音是人类语言的一个普遍但不普遍的特征。
Before embarking on the following sections, a word should be said about the concept of stress in linguistics. Stress is recognized as one aspect of word prosody in human languages; tone and lexical pitch accent are two other aspects. Just as not all languages have lexical tone (cf. Chapter 2 for a discussion of tone languages) or lexical pitch accent,10 not all languages have lexical stress, in other words, a systematic marking of certain syllables within a word as more prominent than others. Importantly, these three aspects of word prosody are not mutually exclusive. For example, there are tone languages with stress (e.g., Mandarin) and without it (e.g., Cantonese), and pitch-accent languages with or without stress (e.g., Swedish and Japanese, respectively; Jun, 2005). Thus in the discussion below, it should be kept in mind that stress is a widespread but not universal feature of human language.
下面描述了节奏类型学的四种方法。所有这些方法的背后是一种理解世界语言节奏之间关系的共同愿望。
Four approaches to rhythmic typology are described below. Behind all of these approaches is a common desire to understand the relationships of the world’s linguistic rhythms.
迄今为止最有影响力的语言节奏类型学是基于语音周期性的概念。这种类型学起源于肯尼斯·派克 (Kenneth Pike, 1945) 的工作,他提出了一种语音节奏理论,该理论基于语言在音节和重音模式方面的二分法。基于音节划出大致相等的时间间隔。这些与英语等“压力时间”语言形成对比,后者的特点是压力之间的时间间隔大致相等。为了说明重音节奏,派克邀请读者“注意句子中重音之间或多或少相等的时间流逝”:
The most influential typology of language rhythm to date is based on the notion of periodicity in speech. This typology has its roots in the work of Kenneth Pike (1945), who proposed a theory of speech rhythm based on a dichotomy between languages in terms of syllable and stress patterns. He dubbed certain languages (such as Spanish) “syllable-timed,” based on the idea that syllables mark off roughly equal temporal intervals. These stood in contrast to “stress-timed” languages such as English, which were characterized by roughly equal temporal intervals between stresses. To illustrate stress-timed rhythm, Pike invited the reader to “notice the more or less equal lapses of time between the stresses in the sentence”:
派克然后要求读者将上面句子中的重音时间与下面的句子进行比较,并注意“尽管音节数不同”的相似性(第 34 页):
Pike then asked the reader to compare the timing of stresses in the above sentence with the following one, and notice the similarity “despite the different number of syllables” (p. 34):
派克认为,在重音计时语言中,重音音节(称为“英尺”)之间的间隔大致相等,尽管每英尺的音节数在变化。为了实现均匀计时的脚,扬声器会拉伸或压缩音节以适应典型的脚持续时间。派克认为,学习一种语言的节奏对于纠正发音至关重要。例如,他指出,学习英语的讲西班牙语的人“必须放弃他们尖锐的逐音节发音,并挤在一起——或在必要时加长——英语元音和辅音,以获得重音计时类型的节奏单位” (第 35 页)。
Pike argued that in stress-timed languages the intervals between stressed syllables (referred to as “feet”) were approximately equal despite changing numbers of syllables per foot. To achieve evenly timed feet, speakers would stretch or compress syllables to fit into the typical foot duration. Pike believed that learning the rhythm of a language was essential to correct pronunciation. He noted, for example, that Spanish speakers learning English “must abandon their sharp-cut syllable-by-syllable pronunciation and jam together—or lengthen where necessary—English vowels and consonants so as to obtain rhythm units of the stress-timing type” (p. 35).
Abercrombie (1967:34-36, 96-98) 比 Pike 走得更远,提出了重音与音节时间的生理学基础。这个大胆的步骤是基于一个关于如何产生音节的特定假设。Abercrombie 认为每个音节都与与呼气相关的肌肉收缩(胸腔的肋间肌)有关,并且某些收缩特别强烈:这些后一种收缩会产生重音音节。他将这两种类型的收缩称为“胸部脉搏”和“压力脉搏”(因此只有一些胸部脉搏是压力脉搏;参见 Stetson 1951)。阿伯克龙比提出,在任何给定的语言中,一种或另一种脉搏有节奏地发生。然后他将节奏等同于周期性:“节奏,在言语和其他人类活动中,产生于某种运动的周期性重复。. ” (第 96 页)。此外,他声称“就目前所知,世界上的每一种语言都以一种节奏或另一种节奏说话”(第 97 页),将英语、俄语和阿拉伯语命名为重音时间语言的示例,以及法语、泰卢固语和约鲁巴语作为音节计时语言的例子。就像派克所做的那样,他指出,一种语言不可能既是重读时间又是音节时间。因为重音之间的音节数量是可变的,所以使重音间隔的持续时间相等意味着“必须不断调整音节的连续速率,以便将不同数量的音节适应相同的时间间隔。” 世界上的每一种语言都以一种节奏或另一种节奏说话”(第 97 页),将英语、俄语和阿拉伯语命名为重读语言的示例,并将法语、泰卢固语和约鲁巴语命名为音节-定时语言。就像派克所做的那样,他指出,一种语言不可能既是重读时间又是音节时间。因为重音之间的音节数量是可变的,所以使重音间隔的持续时间相等意味着“必须不断调整音节的连续速率,以便将不同数量的音节适应相同的时间间隔。” 世界上的每一种语言都以一种节奏或另一种节奏说话”(第 97 页),将英语、俄语和阿拉伯语命名为重读语言的示例,并将法语、泰卢固语和约鲁巴语命名为音节-定时语言。就像派克所做的那样,他指出,一种语言不可能既是重读时间又是音节时间。因为重音之间的音节数量是可变的,所以使重音间隔的持续时间相等意味着“必须不断调整音节的连续速率,以便将不同数量的音节适应相同的时间间隔。” 他指出,一种语言不可能既是重读时间又是音节时间。因为重音之间的音节数量是可变的,所以使重音间隔的持续时间相等意味着“必须不断调整音节的连续速率,以便将不同数量的音节适应相同的时间间隔。” 他指出,一种语言不可能既是重读时间又是音节时间。因为重音之间的音节数量是可变的,所以使重音间隔的持续时间相等意味着“必须不断调整音节的连续速率,以便将不同数量的音节适应相同的时间间隔。”
Abercrombie (1967:34-36, 96-98) went further than Pike and proposed a physiological basis for stress versus syllable timing. This bold step was based on a specific hypothesis for how syllables are produced. Abercrombie believed that each syllable was associated with a contraction of muscles associated with exhalation (the intercostal muscles of the rib cage), and that some contractions were especially strong: These latter contractions produced stressed syllables. He referred to these two types of contractions as “chest pulses” and “stress pulses” (thus only some chest pulses were stress pulses; cf. Stetson 1951). Abercrombie proposed that in any given language, one or the other kind of pulse occurred rhythmically. He then equated rhythm with periodicity: “Rhythm, in speech as in other human activities, arises out of the periodic recurrence of some sort of movement. . .” (p. 96). Furthermore, he claimed that “as far as is known, every language in the world is spoken with one kind of rhythm or with the other” (p. 97), naming English, Russian, and Arabic as examples of stress-timed languages, and French, Telugu, and Yoruba as examples of syllable-timed languages. Just as Pike had done, he noted that a language could not be both stress-timed and syllable-timed. Because there are variable numbers of syllables between stresses, equalizing the duration of interstress intervals meant that “the rate of syllable succession has to be continually adjusted, in order to fit varying numbers of syllables into the same time interval.”
很难高估 Pike 和 Abercrombie 对言语节奏研究的影响。“重音计时”和“音节计时”这两个术语已成为语言学标准词汇的一部分。第三类,“mora-timing”,也是标准用法,用于描述日语说话的节奏。音节是比音节小的单位,通常由一个辅音和元音组成,但有时只包含一个辅音或元音。Ladefoged (1975:224) 指出“每个音节的发音时间大致相同”,因此支持音节的粗略等时性。11自从 Abercrombie 的书出版以来,许多语言都被归类为这两个类别之一(Dauer,1983 年;Grabe & Low,2002 年),并且许多研究调查了语音中的等时性问题。从这个意义上讲,语音节奏的重音与音节计时理论非常富有成果。它提供了一个清晰的、可通过经验检验的假设以及生理学依据。
It is hard to overestimate the impact of Pike and Abercrombie on the study of rhythm in speech. The terms “stress-timed” and “syllable-timed” have become part of the standard vocabulary of linguistics. A third category, “mora-timing,” is also in standard use, and is used to describe the rhythm of Japanese speech. The mora is a unit that is smaller than the syllable, usually consisting of a consonant and vowel, but sometimes containing only a single consonant or vowel. Ladefoged (1975:224) stated that “each mora takes about the same length of time to say,” thus arguing for the rough isochrony of morae.11 Since the publication of Abercrombie’s book, many languages have been classified into one of these two categories (Dauer, 1983; Grabe & Low, 2002), and many research studies have examined the issue of isochrony in speech. In this sense, the stress versus syllable-timed theory of speech rhythm has been very fruitful. It provided a clear, empirically testable hypothesis together with a physiological justification.
然而,从另一个意义上说,该理论彻底失败了。语音的经验测量未能为音节或重音的等时性提供任何支持(参见 Bertinetto 的参考资料,1989)。12仅举许多检验等时假设的论文中的几个例子,Dauer (1983) 表明英语重音脚的持续时间随着音节数量的增加而增加,而不是保持等时所需的均匀持续时间(参见 Levelt,1989) :393). Roach (1982) 将英语、俄语和阿拉伯语与法语、泰卢固语和约鲁巴语进行了比较,并证明不能根据重音间隔的时间来区分前者重音计时语言与后者音节计时语言。最后,Beckman (1982) 和 Hoequist (1983) 表明,日语中的短节音持续时间不同。
In another sense, however, the theory has been an utter failure. Empirical measurements of speech have failed to provide any support for the isochrony of syllables or stresses (see references in Bertinetto, 1989).12 To take just a few examples from the many papers that have tested the isochrony hypothesis, Dauer (1983) showed that English stress feet grow in duration with increasing number of syllables, rather than maintaining the even duration necessary for isochrony (cf. Levelt, 1989:393). Roach (1982) compared English, Russian, and Arabic to French, Telugu, and Yoruba and demonstrated that the former stress-timed languages could not be discriminated from the latter syllable-timed ones on the basis of the timing of interstress intervals. Finally, Beckman (1982) and Hoequist (1983) showed that morae are not of equal duration in Japanese.
鉴于普通语音中的周期性概念在 20 多年前就被经验证伪了,为什么重读时间、音节时间和音节时间的标签仍然存在?一个原因可能是它符合关于节奏的主观直觉。例如,Abercrombie 自己 (1967:171) 指出,英语中等时重音的概念可以追溯到 18 世纪。贝克曼 (1992) 提出了另一个原因,他认为这种三方方案之所以持续存在是因为它正确地将被认为节奏相似的语言组合在一起,即使这种分组的物理基础尚不清楚(并且不是任何类型的等时性) ).
Given that the notion of periodicity in ordinary speech was empirically falsified over 20 years ago, why do the labels of stress-timing, syllable-timing, and mora-timing persist? One reason may be that it matches subjective intuitions about rhythm. For example, Abercrombie himself (1967:171) noted that the idea of isochronous stress in English dates back to the 18th century. Another reason is suggested by Beckman (1992), who argues that this tripartite scheme persists because it correctly groups together languages that are perceived as rhythmically similar, even if the physical basis for this grouping is not clearly understood (and is not isochrony of any kind).
那么,本节的重点是,在许多音乐节奏中起着如此重要作用的周期性并不是普通演讲节奏的一部分。下一节探讨了一种不同的语音节奏方法,一种抛开等时性概念的方法。
The key point of the current section, then, is that periodicity, which plays such an important role in much musical rhythm, is not part of the rhythm of ordinary speech. The next section explores a different approach to speech rhythm, one that sets aside notions of isochrony.
语音不是同步的这一事实不应该导致我们放弃语音节奏的想法。也就是说,如果人们将节奏视为一种可能与等时性无关的语言中的系统时间、重音和分组模式,那么研究就可以向前推进。该框架中的一种富有成效的方法是韵律类型学的音位学方法。这种方法的基本思想是语言的节奏是其语言结构的产物,而不是组织原则例如重音或音节同步(Dauer, 1983; cf. Dasher & Bolinger, 1982)。在这种观点下,语言在节奏上是不同的,因为它们在影响它们如何及时组织为模式的语音特性方面有所不同。Dauer (1983, 1987) 对这一观点进行了清晰的阐述,他提出了几个影响言语节奏的因素。
The fact that speech is not isochronous should not lead us to discard the idea of speech rhythm. That is, research can move forward if one thinks of rhythm as systematic timing, accentuation, and grouping patterns in a language that may have nothing to do with isochrony. One productive approach in this framework is the phonological approach to rhythmic typology. The fundamental idea of this approach is that the rhythm of a language is the product of its linguistic structure, not an organizational principle such as stress or syllable isochrony (Dauer, 1983; cf. Dasher & Bolinger, 1982). In this view, languages are rhythmically different because they differ in phonological properties that influence how they are organized as patterns in time. One clear exposition of this idea is that of Dauer (1983, 1987), who posited several factors that influence speech rhythm.
第一个因素是语言中音节结构的多样性。13语言在音节类型清单上有很大差异。例如,英语的音节范围从单个音素(例如单词“a”)到七个音素(如“strengths”),并且允许在首音和尾音中使用最多三个辅音。与之形成鲜明对比的是,日语(和许多波利尼西亚语言)等语言允许的音节类型很少,并且以简单的 CV 音节为主。西班牙语和法语等浪漫语言比日语或夏威夷语有更多的音节类型,但避免了英语和荷兰语等语言中的复杂音节,并且实际上显示了打破或阻止创建具有许多片段的音节的活跃过程(Dauer, 1987)。
The first factor is the diversity of syllable structures in a language.13 Languages vary substantially in their inventory of syllable types. For example, English has syllables ranging from a single phoneme (e.g., the word “a”) up to seven phonemes (as in “strengths”), and allows up to three consonants in onset and coda. In sharp contrast, languages such as Japanese (and many Polynesian languages) allow few syllable types and are dominated by simple CV syllables. Romance languages such as Spanish and French have more syllable types than Japanese or Hawaiian but avoid the complex syllables found in languages such as English and Dutch, and in fact show active processes that break up or prevent the creation of syllables with many segments (Dauer, 1987).
一种语言可用音节的多样性会影响口语中音节类型的多样性。例如,Dauer (1983) 发现,在口语法语样本中,超过一半的音节标记具有简单的 CV 结构,而在类似的英语样本中,CV 音节仅占音节标记的三分之一左右。这些差异与节奏有关,因为音节持续时间与每个音节的音素数量相关,这表明英语句子的音节持续时间(平均)应该比法语句子更多。
The diversity of syllables available to a language influences the diversity of syllable types in spoken sentences. For example, Dauer (1983) found that in a sample of colloquial French, over half the syllable tokens had a simple CV structure, whereas in a similar English sample, CV syllables accounted for only about one-third of the syllable tokens. These differences are relevant to rhythm because syllable duration is correlated with the number of phonemes per syllable, suggesting that sentences of English should have more variable syllable durations (on average) than French sentences.
影响说话节奏的第二个因素是元音减少。在某些语言中,例如英语,非重读音节通常有声学集中且持续时间短的元音(语言学家通常将这种声音称为“schwa”,一个听起来像“uh”的中性元音)。相比之下,在其他语言(例如西班牙语)中,非重读音节的元音很少甚至减少,从而导致重读音节和非重读音节之间的元音持续时间模式变化较小。
The second factor affecting speech rhythm is vowel reduction. In some languages, such as English, unstressed syllables often have vowels that are acoustically centralized and short in duration (linguists commonly refer to this sound as “schwa,” a neutral vowel sounding like “uh”). In contrast, in other languages (such as Spanish) the vowels of unstressed syllables are rarely if ever reduced, contributing to a less variable pattern of vowel duration between stressed and unstressed syllables.
Dauer 提出的第三个节奏因素是重音对元音时长的影响。在某些语言中,重音对音节中元音的持续时间有很大影响。例如,最近对英语口语的一项测量发现,重读音节中的元音比非重读音节中的相同元音长约 60%(Greenberg,2006 年)。相比之下,对西班牙语的研究表明重音不会以相同的程度影响元音持续时间(Delattre,1966)。
The third rhythmic factor proposed by Dauer is the influence of stress on vowel duration. In some languages, stress has a strong effect on the duration of a vowel in a syllable. For example, one recent measurement of spoken English finds that vowels in stressed syllables are about 60% longer than the same vowels in unstressed syllables (Greenberg, 2006). In contrast, studies of Spanish suggest that stress does not condition vowel duration to the same degree (Delattre, 1966).
Dauer 认为,传统上被归类为重音计时语言与音节计时语言在上述语音特征上有所不同,重音计时语言使用更广泛的音节类型,具有减少元音的系统,并且显示出重音对元音的强烈影响期间。这很好地说明了语音节奏作为音位学产物的观点,而不是因果原则(例如,涉及周期性)。14
Dauer suggested that languages traditionally classified as stress-timed versus syllable-timed differ in the above phonological features, with stress-timed languages using a broader range of syllable types, having a system of reduced vowels, and exhibiting a strong influence of stress on vowel duration. This nicely illustrates the perspective of speech rhythm as a product of phonology, rather than a causal principle (e.g., involving periodicity).14
道尔的提议导致了可检验的预测。具体来说,她概述的三个因素(音节结构的多样性、元音减少以及重音对元音持续时间的影响)都应该有助于重音计时话语与音节计时话语的音节之间更大程度的持续时间变异性。令人惊讶的是,几乎没有关于重音句子与音节同步语言中音节持续时间变化的已发表数据。原因之一可能是语音中音节边界的划分并不总是直截了当的。尽管人们通常同意一个单词或话语有多少个音节,但对于音节之间的界限在哪里可能存在分歧,即使在语言学家之间也是如此。例如,“音节”一词中的第一个“l”是属于第一个音节的末尾还是第二个音节的开头,还是属于两个音节的“双音节”?虽然音节测量确实受制于可能因研究人员而异的决定,但这不应该妨碍实证研究:它只是意味着测量应该伴随着每个音节边界放置位置的指示。我在下面回到这一点。它只是意味着测量应该伴随着每个音节边界放置位置的指示。我在下面回到这一点。它只是意味着测量应该伴随着每个音节边界放置位置的指示。我在下面回到这一点。
Dauer’s proposal leads to testable predictions. Specifically, the three factors she outlines (diversity in syllable structure, vowel reduction, and the influence of stress on vowel duration) should all contribute to a greater degree of durational variability among the syllables of stress-timed versus syllable-timed utterances. Surprisingly, there is little published data on durational variability of syllables in sentences of stress versus syllable-timed languages. One reason for this may be that the demarcation of syllable boundaries in speech is not always straightforward. Although people generally agree on how many syllables a word or utterance has, there can be disagreement about where the boundaries between syllables are, even among linguists. For example does the first “l” in the word “syllable” belong to the end of the first syllable or to the beginning of the second syllable, or is it “ambisyllabic,” belonging to both syllables? Although it is true that syllable measurements are subject to decisions that may vary from one researcher to the next, this should not impede empirical research: It simply means that measurements should be accompanied by an indication of where each syllable boundary was placed. I return to this point below.
在转向另一种语音节奏的语音方法之前,值得注意的是 Dauer 列出的语音属性并不总是同时出现。因此道尔反对离散节奏类的想法和节奏连续体的概念。为支持这一观点,Nespor (1990) 注意到波兰语具有复杂的音节结构但没有元音缩减(在正常语速下),而加泰罗尼亚语具有简单的音节结构但确实有元音缩减。因此,目前在语音节奏领域存在争论,即语言是否真的属于离散的节奏类,或者是否存在基于节奏相关语音因素共现模式的连续统一体(参见 Arvaniti,1994;Grabe & 低,2002 年)。只有进一步的研究才能解决这个问题,第 3.3.1 节,“感知和类型学”小节)。
Before turning to another phonological approach to speech rhythm, it is worth noting that the phonological properties listed by Dauer do not always co-occur. Thus Dauer argued against the idea of discrete rhythmic classes and for the notion of a rhythmic continuum. In support of this idea, Nespor (1990) has noted that Polish has complex syllable structure but no vowel reduction (at normal speech rates), and Catalan has simple syllable structure but does have vowel reduction. Thus there is currently a debate in the field of speech rhythm as to whether languages really do fall into discrete rhythm classes or whether there is a continuum based on the pattern of co-occurrence of rhythmically relevant phonological factors (cf. Arvaniti, 1994; Grabe & Low, 2002). Only further research can resolve this issue, particularly perceptual research (as discussed below in section 3.3.1, subsection “Perception and Typology”).
我现在简要地转向 Dwight Bolinger (1981) 提出的不同的语音节奏语音理论。虽然博林格专注于英语,但他的想法与类型学问题非常相关。博林格理论的基础是英语中有两组不同的元音:全元音和缩元音。博林格所说的“减少”元音并不仅仅意味着非重读音节中的元音是短的和声学上集中的(即语音定义)。他主张英语中有一个减少元音的音系类别,其行为与其他元音不同。Bolinger 在这个类别中放置了三个元音,一个像“ih”的元音,一个像“uh”的元音,以及一个像“oh”的元音(更像“uh”而不是完整的元音“o”)。从语音上讲,所有这些元音都出现在元音空间的中心区域,图 2.19:Bolinger 的“ih”和“oh”元音未在该图中显示,但前者会出现在 /a/ 的左侧和上方,而后者会出现在 /a/ 的右侧和上方). Bolinger (1981:3-9) 提出论据支持这些元音是语音上不同的子类的观点,换句话说,它们以某些方式表现,而全元音则没有。篇幅限制无法详细讨论这些论点。在这里,我将重点关注 Bolinger 提出的关于与语音节奏相关的全元音和缩元音的两个主张。
I now turn briefly to a different phonological theory of speech rhythm, proposed by Dwight Bolinger (1981). Although Bolinger focused on English, his ideas are quite relevant to typological issues. The foundation of Bolinger’s theory is the notion that there are two distinct sets of vowels in English: full vowels and reduced vowels. By “reduced” vowels Bolinger does not simply mean vowels in unstressed syllables that are short and acoustically centralized (i.e., a phonetic definition). He argues for a phonological class of reduced vowels in English, which behave differently from other vowels. Bolinger places three vowels in this class, an “ih”-like vowel, and “uh”-like vowel, and a “oh”-like vowel (more similar to “uh” than to the full vowel “o”). Phonetically all of these vowels occur in the central region of vowel space, near the schwa vowel /a/ of English (see Figure 2.19: Bolinger’s “ih” and “oh” vowel are not shown in that figure, but the former would occur just to the left and up from /a/, and the latter would occur just to the right and up from /a/). Bolinger (1981:3-9) presents arguments to support the notion that these vowels are a phonologically distinct subclass, in other words, that they behave in certain ways that full vowels do not. Space limitations prevent a detailed discussion of these arguments. Here I will focus on two claims Bolinger makes about full and reduced vowels that are relevant for speech rhythm.
首先,他声称包含全元音和缩元音的音节在英语句子中往往会交替出现。其次,他声称存在一个“延长规则”,即“当一个长音节后跟一个短音节时,短音节会借用它的时间并使其相对较短”(第 18 页)。(“长”音节指的是全元音音节,“短”音节指的是减少元音的音节;没有要求两种类型的音节之间有特定的持续时间比。 ) 为了说明这条规则,Bolinger 提供了以下示例(请注意,第一句话来自一种特殊肥皂的广告):
First, he claims that syllables containing full and reduced vowels tend to alternate in English sentences. Second, he claims that there is a “lengthening rule” such that “when a long syllable is followed by a short one, the short one borrows time from it and makes it relatively short” (p. 18). (By a “long” syllable, he means a syllable with a full vowel, and by a “short” syllable, he means a syllable with a reduced vowel; there is no claim for a particular duration ratio between the two types of syllables.) To illustrate this rule, Bolinger offers the following example (note that the first sentence is from an ad for a special type of soap):
在上面的例子中,我用 L -表示第二个句子的缩写 L's (根据 Faber,1986)。这个例子的要点是句子 3.6 的每个 L –都比句子 3.5 的 L 短,并且发生这种情况(根据 Bolinger 的说法)是因为每个 S 从前面的 L “借用时间”。请注意,句子 3.6 在两者之间有严格的交替L 和 S。这是一个特例:Bolinger 没有声明严格交替,只是声明了一种趋势(因此 LLSSSLSLL 等序列......是完全可能的)。我怀疑 Bolinger 选择 3.5 和 3.6 中的句子作为例子是因为他觉得每个 (L –S) pair in sentence 3.6 is not terribly different in duration from each L in sentence 3.5: This is suggested by his graphical placement of L's in the two sentences one over the another, in his original text. 句子 3.6 中的 S) 对与句子 3.5 中的每个 L 在持续时间上并没有太大的不同:这是通过他在原始文本中将两个句子中的 L 图形放置在彼此之上来暗示的。然而,L 和 (L – S) 的持续时间等价性不是 Bolinger 主张的一部分。这是很重要的一点。Bolinger的理论可能与等时性的主观印象有关(因为L和S的粗略交替和加长规则),但它没有等时性原理。
In the example above, I have indicated the shortened L’s of the second sentence by L- (after Faber, 1986). The point of this example is that each L– of sentence 3.6 is shorter than the L’s of sentence 3.5, and this occurs (according to Bolinger) because each S “borrows time” from the preceding L. Note that sentence 3.6 has strict alternation between L and S. This is a special case: Bolinger makes no claims for strict alternation, only a claim for a tendency (thus sequences such as L L S S S L S L L . . . are perfectly possible). I suspect Bolinger chose the sentences in 3.5 and 3.6 as examples because he felt that each (L– S) pair in sentence 3.6 is not terribly different in duration from each L in sentence 3.5: This is suggested by his graphical placement of the L’s in the two sentences above one another, in his original text. However, the durational equivalence of L and (L– S) is not part of Bolinger’s claim. This is an important point. Bolinger’s theory may be relevant to the subjective impression of isoch-rony (because of the rough alternation of L and S and the lengthening rule), but it has no isochrony principle.
Faber (1986) 认为,在向外国学生教授英语节奏时,Bolinger 的理论优于压力时间理论 (cf. Chela-Flores, 1994)。他还指出,Bolinger 的理论可用于解释压力时间理论无法解释的特征时间模式,例如为什么“购物车”在以下方面较短:
Faber (1986) argues that Bolinger’s theory is superior to stress-timing theory when it comes to teaching the rhythm of English to foreign students (cf. Chela-Flores, 1994). He also points out that Bolinger’s theory can be used to explain characteristic timing patterns that stress-timing theory cannot account for, such as why “cart” is shorter in:
比在:
than in:
比在:
than in:
Bolinger 的语音节奏理论不同于 Dauer 概述的理论,因为它不仅处理可变音节持续时间,还处理持续时间的模式。具体来说,Bolinger 认为,英语特有的节奏是由于音节与全元音和减元音的粗略交替,以及全元音在减元音介入时改变持续时间的方式添加了音节。这已经足以为语言之间的类型学区分提供依据。例如,人们可能会检验重音计时语言在相邻元音持续时间上比音节计时语言具有更多对比的想法,并且重音计时语言具有 Bolinger 为英语建议的那种类型的延长规则(Bolinger 自己并不建议这些想法,但它们显然是他工作的必然结果)。如果博林格就此止步,他就已经为语音节奏研究做出了宝贵的贡献。然而,Bolinger 的理论还有一个组成部分,它代表了与 Dauer 概述的理论的根本分歧。
Bolinger’s theory of speech rhythm is distinct from the theory outlined by Dauer in that it deals not just with the variability syllable duration but with the patterning of duration. Specifically, Bolinger argues that the characteristic rhythm of English is due to the rough alternation of syllables with full and reduced vowels, and to the way full vowels change duration when intervening reduced syllables are added. This is already enough to suggest a basis for typological distinctions between languages. For example, one might test the idea that stress-timed languages have more contrast in adjacent vowel durations than do syllable timed languages, and that stress-timed languages have lengthening rules of the type suggested by Bolinger for English (Bolinger himself does not suggest these ideas, but they are an obvious corollary of his work). If Bolinger had stopped here, he would already have made a valuable contribution to speech rhythm research. Bolinger’s theory has one further component, however, that represents a fundamental divergence from the theory outlined by Dauer.
再次聚焦英语,博林格提出,正如有两种元音(全元音和缩元音)一样,也有两种韵律。第一个是已经描述过的节奏模式,换句话说,长短音节的粗略交替和延长规则。然而,在这个层次之上,是第二层次的节奏模式,它与音调提示的重音之间的时间关系有关。请注意,这个想法需要这样的概念,即音节节奏从根本上讲是关于持续时间的,而不依赖于音调作为提示。换句话说,“有一个独立于音调模式的基本水平的时间模式”(Bolinger 1981:24,引用 Bruce,1981)。Bolinger 认为,即使在单调的语音中也可以观察到这种时间模式。讲话不是单调的,但是,一种将音高重音分开的趋势,这样它们就不会在时间上靠得太近。避免“口音冲突”的机制是将相邻的口音彼此远离,这种现象在英语中有时称为“重音转移”。(一个经常被引用的重音转移的例子是当“thirtéen”变成“thírteen mén”;Liberman & Prince, 1977。)“重音转移”这个词有点不幸,因为有证据表明正在改变的是音高重音,不是音节持续时间或幅度(Shattuck-Hufnagel 等人,1994)。
Once again focusing on English, Bolinger suggested that just as there are two kinds of vowels (full and reduced), there are also two kinds of rhythm. The first is the rhythmic patterning already described, in other words, the rough alternation of long and short syllables and the lengthening rule. Above this level, however, is a second level of rhythmic patterning concerned with temporal relations between accents cued by pitch. Note that this idea entails the notion that syllabic rhythm is fundamentally about duration and does not rely on pitch as a cue. In other words, “there is a basic level of temporal patterning that is independent of tonal patterning” (Bolinger 1981:24, citing Bruce, 1981). Bolinger argues that this temporal patterning would be observable even in speech spoken on a monotone. Speech is not spoken on a monotone, however, and Bolinger argues that syllables accented by pitch form a second level of rhythmic patterning in which the fundamental rule is a tendency to separate pitch accents so that they do not occur too closely together in time. The mechanism for avoiding “accent clash” is to move adjacent accents away from each other, a phenomenon sometimes called “stress-shift” in English. (One oft-cited example of stress shift is when “thirtéen” becomes “thírteen mén”; Liberman & Prince, 1977.) The term “stress-shift” is somewhat unfortunate, because there is evidence that what is shifting is pitch accent, not syllable duration or amplitude (Shattuck-Hufnagel et al., 1994).
语音节奏涉及两个不同语言水平的时间模式的想法值得进行比迄今为止获得的更多的实证研究。我将在3.3.4 节中回到这个想法。
The idea that speech rhythm involves temporal patterning at two distinct linguistic levels merits far more empirical research than it has garnered to date. I will return to this idea in section 3.3.4.
直到最近,持续时间的测量在语音节奏的研究中发挥了很大的负面作用,即在伪造重音或音节周期性的主张方面。然而,语音学方法的见解为持续测量创造了新的积极作用。这项工作的一个关键特征是放弃了对等时性的任何搜索,并将重点放在语音节奏中涉及的语音现象的持续时间相关性上。Ramus 及其同事 (1999) 受到 Dauer 见解的启发,根据关于音节结构应如何影响这种模式的想法,研究了语音中元音和辅音的持续模式。例如,语言使用更多种类音节类型的语言(即重音计时语言)可能比以简单音节为主的语言在句子中花在元音上的时间相对较少,因为前者语言中经常出现辅音簇。通过类似的推理,句子中辅音间隔的持续变化(定义为元音之间的辅音序列,不考虑音节或单词边界)对于音节结构更多样化的语言应该更大。后一点在 3.11 中进行了示意性说明,其中音节之间的边界用点标记,辅音间隔带有下划线:
Until very recently, the measurement of duration has had a largely negative role in the study of speech rhythm, namely in falsifying claims for the periodicity of stresses or syllables. The insights of the phonological approach, however, have created a new positive role for durational measurements. A key feature of this work has been the abandonment of any search for isochrony, and a focus on durational correlates of phonological phenomena involved in speech rhythm. Ramus and colleagues (1999), inspired by the insights of Dauer, examined the durational pattering of vowels and consonants in speech, based on ideas about how syllable structure should influence this patterning. For example, languages that use a greater variety of syllable types (i.e., stress-timed languages) are likely to have relatively less time devoted to vowels in sentences than languages dominated by simple syllables, due to the frequent consonant clusters in the former languages. By similar reasoning, the durational variability of consonantal intervals in sentences (defined as sequences of consonants between vowels, irrespective of syllable or word boundaries) should be greater for languages with more diverse syllable structures. This latter point is schematically illustrated in 3.11, in which boundaries between syllables are marked with a dot and consonantal intervals are underlined:
请注意 3.11a 中音节类型的更大多样性如何导致元音之间辅音数量的更大变化(可能转化为辅音间隔更大的持续时间变化)以及更低的元音与辅音比率(可能转化为更低的花在元音上的话语持续时间的一部分)。
Note how the greater diversity of syllable types in 3.11a leads to greater variation in the number of consonants between vowels (likely to translate into greater durational variability of consonantal intervals) as well as a lower vowel to consonant ratio (likely to translate into a lower fraction of utterance duration spent on vowels).
这些想法得到了实证测量的证实。图 3.7(来自 Ramus 等人,1999 年)显示了八种语言句子中元音所占持续时间百分比 (%V) 与辅音间隔变异性 (AC) 的关系图。(每种语言的数据来自 4 个说话者朗读的 20 个句子,即每个说话者读 5 个句子。)
These ideas were borne out by empirical measurements. Figure 3.7 (from Ramus et al., 1999) shows a graph with percent of duration occupied by vowels (%V) versus consonantal interval variability (AC) within sentences in eight languages. (The data for each language came from 20 sentences read by four speakers, i.e., five sentences per speaker.)
图 3.7 8 种语言的句子中元音占据的句子持续时间百分比与句子中辅音间隔的标准差。(CA = 加泰罗尼亚语,DU = 荷兰语,EN = 英语,FR = 法语,IT = 意大利语,JA = 日语,PO = 波兰语,SP = 西班牙语。)误差条显示 +/- 1 个标准误差。来自 Ramus、Nespor 和 Mehler,1999。
Figure 3.7 Percentage of sentence duration occupied by vowels versus the standard deviation of consonantal intervals within sentences for 8 languages. (CA = Catalan, DU = Dutch, EN = English, FR = French, IT = Italian, JA = Japanese, PO = Polish, SP = Spanish.) Error bars show +/– 1 standard error. From Ramus, Nespor, & Mehler, 1999.
该图的有趣之处在于,传统上归类为重音计时的语言(英语和荷兰语)具有低 %V 和高 AC 值,并且与传统上归类为音节计时的语言(法语、意大利语和西班牙语)。此外,语言学家将日语置于不同的韵律类别(节拍)中,与其他语言隔离开来。(波兰语和加泰罗尼亚语在此图中的位置将在下一节关于感知的部分中讨论。)这证明了传统语言节奏类的经验关联,并启发了其他研究人员在此框架中研究更多语言。一项有趣的研究是 Frota 和 Vigário (2001) 的研究,他们研究了巴西葡萄牙语与欧洲葡萄牙语(以下简称 BP 和 EP)的节奏。语言学家经常声称这两个变体在节奏上是不同的,EP 是重音节拍的,而 BP 是音节节拍的或具有混合节奏特征。这使得葡萄牙语成为语音节奏研究的一个引人入胜的话题,因为人们可以研究单词完全相同但节奏不同的句子。(英国英语和新加坡英语提供了另一个这样的机会,因为前者是重音计时的,而后者被描述为音节计时的;参见 Low 等人,2000 年。)Frota 和 Vigário 比较了欧洲人和巴西人所说的句子Portuguese,并发现 EP 的 AC 明显高于 BP,%V 明显低于 BP,正如 Ramus 等人的发现所预测的那样。BP 是音节计时的或具有混合节奏特征。这使得葡萄牙语成为语音节奏研究的一个引人入胜的话题,因为人们可以研究单词完全相同但节奏不同的句子。(英国英语和新加坡英语提供了另一个这样的机会,因为前者是重音计时的,而后者被描述为音节计时的;参见 Low 等人,2000 年。)Frota 和 Vigário 比较了欧洲人和巴西人所说的句子Portuguese,并发现 EP 的 AC 明显高于 BP,%V 明显低于 BP,正如 Ramus 等人的发现所预测的那样。BP 是音节计时的或具有混合节奏特征。这使得葡萄牙语成为语音节奏研究的一个引人入胜的话题,因为人们可以研究单词完全相同但节奏不同的句子。(英国英语和新加坡英语提供了另一个这样的机会,因为前者是重音计时的,而后者被描述为音节计时的;参见 Low 等人,2000 年。)Frota 和 Vigário 比较了欧洲人和巴西人所说的句子Portuguese,并发现 EP 的 AC 明显高于 BP,%V 明显低于 BP,正如 Ramus 等人的发现所预测的那样。15
What is interesting about this graph is that languages traditionally classified as stress-timed (English and Dutch) have low %V and high AC values, and occupy a different region of the graph than languages traditionally classified as syllable timed (French, Italian, and Spanish). Furthermore, Japanese, which linguists place in a different rhythmic category (mora-timed) is isolated from the other languages. (The location of Polish and Catalan in this graph is discussed in the next section, on perception.) This demonstrated an empirical correlate of traditional linguistic rhythmic classes, and has inspired other researchers to examine more languages in this framework. One interesting study is that of Frota and Vigário (2001), who examined the rhythm of Brazilian Portuguese versus European Portuguese (henceforth BP and EP). Linguists had often claimed that these two varieties were rhythmically different, with EP being stress-timed, and BP being syllable-timed or having mixed rhythmic characteristics. This makes Portuguese a fascinating topic for speech rhythm research, because one can study sentences with exactly the same words but spoken with different rhythms. (British English and Singapore English provide another such opportunity, because the former is stress-timed and the latter has been described as syllable-timed; see Low et al., 2000.) Frota and Vigário compared sentences spoken by European and Brazilian speakers of Portuguese, and found that EP had significantly a higher AC and lower %V than BP, as predicted by Ramus et al.’s findings.15
关于这一研究系列的一个重要问题涉及 AC 和 %V 的感知相关性。拉姆斯等人。之所以关注这些措施,是因为他们对节奏在婴儿语言感知中的作用感兴趣。有证据表明,新生儿和小婴儿可以区分属于不同节奏类的语言(Mehler 等人,1988 年,1996 年;另见下一节)。Mehler 及其同事 (1996) 认为这种能力有助于引导语言习得:一旦检测到给定的节奏类,就可以触发特定于类的习得机制,将注意力引导到与从连接的语音中分割单词相关的单元(例如,英语中的重音,法语中的音节,如第 3.3.3 节所述,“节奏在分割连接语音中的作用”小节)。为了让这个理论起作用,婴儿必须有一些辨别基础韵律类。因此 Ramus 等人。(1999) 寻求节奏类的声学关联,这需要对语言单位的了解最少。AC 和 %V 是两个这样的参数,因为只需要假设婴儿可以区分元音和辅音(参见 Galves 等人,2002 年,AC 的声学相关性甚至不需要分割成元音和辅音)。
One important question about this line of research concerns the perceptual relevance of AC and %V. Ramus et al. focused on these measures because of their interest in the role of rhythm in infant speech perception. There is evidence that newborns and young infants can discriminate languages that belong to different rhythmic classes (Mehler et al., 1988, 1996; see also the next section). Mehler and colleagues (1996) have argued that this ability helps bootstrap language acquisition: Once a given rhythmic class is detected, class-specific acquisition mechanisms can be triggered that direct attention to the units that are relevant for segmenting words from connected speech (e.g., stresses in the case of English, syllables in the case of French, as discussed in section 3.3.3, subsection “The Role of Rhythm in Segmenting Connected Speech”). For this theory to work, infants must have some basis for discriminating rhythmic class. Thus Ramus et al. (1999) sought an acoustic correlate of rhythmic class that would require minimal knowledge about linguistic units. AC and %V are two such parameters, because one only need assume that the infant can distinguish between vowels and consonants (see Galves et al., 2002, for an acoustic correlate of AC that does not even require segmentation into vowels and consonants).
然而,有人可能会问,AC 和 %V 是否与语音节奏的感知直接相关,或者它们是否仅与另一个与节奏感知更相关的特征相关。也就是说,有人可能会争辩说,这些措施是反映音节结构可变性的全球统计数据,它们本身并不是语音中节奏感知的基础(参见 Barry 等人,2003 年)。一个更有希望的感知相关性候选者可能是音节持续时间的可变性,这可能与音节结构的可变性和元音减少相关。因为音节被广泛认为是语音节奏中的基本单位,而且因为成人和婴儿都对音节模式敏感(例如,van Ooyen 等人,1997),所以值得研究 Ramus 等人使用的句子语料库阿尔。对于音节持续时间的可变性,看看这个参数是否区分传统的节奏类。这也将是对 Dauer 的想法的直接测试,因为她概述的语音因素暗示重音计时语言的句子中的音节持续时间可变性应该高于音节计时语言。
One may ask, however, if AC and %V are directly relevant to the perception of speech rhythm, or if they are simply correlated with another feature that is more relevant to rhythm perception. That is, one could argue that these measures are global statistics reflecting variability in syllable structure, and are not themselves the basis of rhythm perception in speech (cf. Barry et al., 2003). A more promising candidate for perceptual relevance may be variability in syllable duration, which is likely to be correlated with variability in syllable structure and with vowel reduction. Because the syllable is widely regarded as a fundamental unit in speech rhythm, and because both adults and infants are sensitive to syllable patterning (e.g., van Ooyen et al., 1997), it would be worth examining the corpus of sentences used by Ramus et al. for syllable duration variability to see if this parameter differentiates traditional rhythmic classes. This would also be a straightforward test of Dauer’s ideas, as the phonological factors she outlines imply that syllable duration variability should be higher in sentences of stress-timed than of syllable-timed languages.
令人惊讶的是,很少有实证研究比较不同语言之间音节持续时间的句子级变异性。如前一节所述,这可能反映了在连接语音中分配音节边界的困难。从纯实用的角度来看,使用大多数语音学家一致同意的标准(例如,Peterson 和 Lehiste,1960)定义音素边界更容易。然而,这不应停止对音节持续时间模式的研究,因为这些模式可能与感知相关。为了说明基于音节的方法的可行性和挑战,下面的示例 3.11c 和 d 显示了在音节边界处分割的英语和法语句子(分割分别由我和 Franck Ramus 完成)。句点表示我们认为清晰的音节边界,而方括号表示音节从属关系方面似乎模棱两可的音素。在后一种情况下,必须决定在哪里放置音节边界。例如,如果音素听起来是双音节的,那么边界可以放在音素的中间,或者如果它听起来像是用后面的元音重新音节化的,那么边界可以放在音素之前。
Surprisingly, there has been little empirical work comparing sentence-level variability in syllable duration among different languages. As noted in the previous section, this may reflect the difficulties of assigning syllable boundaries in connected speech. From a purely practical standpoint, it is easier to define phoneme boundaries, using criteria agreed upon by most phoneticians (e.g., Peterson & Lehiste, 1960). However, this should not stop research into syllabic duration patterns, because these patterns are likely to be perceptually relevant. To illustrate both the feasibility and the challenges of a syllable-based approach, examples 3.11c and d below show a sentence of English and French segmented at syllable boundaries (the segmentations were done by myself and Franck Ramus, respectively). Periods indicate syllable boundaries that we felt were clear, whereas square brackets indicate phonemes that seemed ambiguous in terms of their syllabic affiliation. In the latter case, one must decide where to place the syllable boundary. For example, if the phoneme sounds ambisyllabic then the boundary can be placed in the middle of the phoneme, or if it sounds like it has been resyllabified with the following vowel, the boundary can be placed before the phoneme.
(3.11c) .last .con.cert .gi.ven .at .the .o[p]era .was .a .tre.men.dous .success
(3.11c) The .last .con.cert .gi.ven .at .the .o[p]era .was .a .tre.men.dous .success
(3.11d) 伊尔。fau.dra .beau.coup .plus .d'ar.gent .pour .me.ne[r] à .bien .ce .pro.jet
(3.11d) Il. fau.dra .beau.coup .plus .d’ar.gent .pour .me.ne[r] à .bien .ce .pro.jet
不同的研究人员做出这些判断的方式可能会有所不同。尽管如此,这对于节奏研究来说并不是一个无法克服的问题。事实上,如果不同的研究人员以略有不同的方式定义音节边界,但仍然集中在他们发现的语言之间的节奏差异上,这就是观察到的差异是可靠的有力证据。16 图3.8a和3.8b显示了我在这两个句子的波形和频谱图中标记的音节边界(这些句子可以在Sound Examples 3.5a和b中听到;注意在句子3.5a中,“opera”读作“opra” ”)。
It is likely that different researchers will vary in how they make these judgment calls. Nevertheless, this is not an insurmountable problem for rhythm research. In fact, if different researchers define syllable boundaries in slightly different ways but nevertheless converge on the rhythmic differences they find between languages, this is strong evidence that the observed differences are robust.16 Figure 3.8a and 3.8b show my markings of syllable boundaries in the waveform and spectrograms of these two sentences (the sentences can be heard in Sound Examples 3.5a and b; note that in sentence 3.5a, “opera” is pronounced “opra”).
图 3.8a分割成音节的英式英语句子。(请注意,这位演讲者将“opera”发音为“opra”。)
Figure 3.8a A sentence of British English segmented into syllables. (Note that “opera” is pronounced “opra” by this speaker.)
对于这些句子,以变异系数(标准差除以平均值)衡量的音节持续时间的变异性,英语句子为 0.53,法语句子为 0.42。对 Ramus 数据库中的所有英语和法语句子进行类似的测量会产生图 3.8c中的数据。可以看出,平均而言,英语句子比法语句子具有更多的可变音节持续时间(差异具有统计显着性,p< 0.01,Mann-Whitney U 检验)。对许多被归类为重音与音节计时的语言进行类似的可变性测量会很有趣:这些测量是否会将语言分为传统的节奏类?(参见 Wagner & Dellwo,2004 年,一个有希望的开始。)
For these sentences, the variability of syllable durations as measured by the coefficient of variation (the standard deviation divided by the mean) is .53 for the English sentence and .42 for the French sentence. Making similar measurements on all the English and French sentences in the Ramus database yields the data in Figure 3.8c. As can be seen, on average English sentences have more variable syllable durations than do French sentences (the difference is statistically significant, p < 0.01, Mann-Whitney U test). It would be interesting to have similar variability measurements for numerous languages that have been classified as stress- versus syllable-timed: Would these measurements divide the languages into their traditional rhythmic classes? (See Wagner & Dellwo, 2004, for a promising start.)
Figure 3.8b A sentence of French segmented into syllables.
现在转向 Dwight Bolinger 的想法,回想一下 Bolinger 的说法,即包含全元音和缩元音的音节在英语中往往会交替出现。这导致了一个经验预测,即英语句子中相邻元音持续时间之间的持续时间对比应该大于不同节奏类的语言,如法语或西班牙语。事实上,有研究支持这一预测,尽管它的灵感并非来自 Bolinger 的工作,而是出于对元音减少在重音与音节同步语言的节奏中所起的作用的兴趣。Low、Grabe 和 Nolan(2000 年)着手探索元音减少通过影响句子中元音时长的可变性来影响重音时间印象的想法。他们通过检查重音计时与音节计时英语(英国英语与新加坡英语)中的元音持续时间模式来测试这个想法。至关重要的是,他们开发了一种对持续时间模式敏感的变异性指数。他们的“归一化成对变异指数”(nPVI) 衡量话语中连续持续时间之间的对比程度。可以通过检查获得 nPVI 的直觉图 3.9示意性地描绘了持续时间不同的两个事件序列(每个条的长度对应于事件的持续时间)。
Turning now to the ideas of Dwight Bolinger, recall Bolinger’s claim that syllables containing full and reduced vowels tend to alternate in English. This leads to an empirical prediction, namely that the durational contrast between adjacent vowel durations in English sentences should be greater than in languages of a different rhythmic class, such as French or Spanish. In fact, there is research supporting this prediction, though it was not inspired by Bolinger’s work but by an interest in the role that vowel reduction plays in the rhythm of stress- versus syllable-timed languages. Low, Grabe, and Nolan (2000) set out to explore the idea that vowel reduction contributes to the impression of stress-timing via its impact on vowel duration variability in sentences. They tested this idea by examining vowel duration patterning in a stress-timed versus a syllable-timed variety of English (British vs. Singapore English). Crucially, they developed an index of variability that was sensitive to the patterning of duration. Their “normalized pairwise variability index” (nPVI) measures the degree of contrast between successive durations in an utterance. An intuition for the nPVI can be gained by examining Figure 3.9, which schematically depicts two sequences of events of varying duration (the length of each bar corresponds to the duration of the event).
图 3.8c 20 个英语和 20 个法语句子中音节持续时间的变异系数 (CV)。误差线显示 +/– 1 个标准误差。
Figure 3.8c The coefficient of variation (CV) of syllable duration in 20 English and 20 French sentences. Error bars show +/– 1 standard error.
在序列 A 中,相邻事件(例如,事件 1 和 2、事件 2 和 3)在持续时间上往往具有较大的对比,因此该序列将具有较大的 nPVI。现在考虑序列 B,它具有与序列 A 相同的一组持续时间,但以不同的时间顺序排列。现在,相邻事件的持续时间对比度往往较低,从而使序列的 nPVI 值较低。因此,这两个序列在持续时间对比方面存在明显差异,即使它们具有完全相同的持续时间变异性总量,例如,通过持续时间的标准偏差来衡量。(有关nPVI 方程,请参见本章的附录 1。 )
In sequence A, neighboring events (e.g., events 1 and 2, 2 and 3) tend to have a large contrast in duration, and hence the sequence would have a large nPVI. Now consider sequence B, which has the same set of durations as sequence A, arranged in a different temporal order. Now neighboring events tend to have low contrast in duration, giving the sequence a low nPVI value. Hence the two sequences have a sharp difference in durational contrastiveness, even though they have exactly the same overall amount of durational variability, for example, as measured by the standard deviation of durations. (See this chapter’s appendix 1 for the nPVI equation.)
图 3.9具有不同持续时间的事件序列示意图,至(较长的条形图 = 较长的持续时间)。详情见正文。
Figure 3.9 Schematic of sequences of events with varying duration, to (longer bars = longer durations). See text for details.
因为 nPVI 从根本上说是对比度的度量,所以在其名称中使用术语“可变性”有些遗憾,因为可变性和对比度不一定相关,如图 3.9所示。事实上,很可能有两个序列 A 和 B,其中 A 中持续时间的可变性大于 B,但 B 中持续时间的 nPVI 大于 A(3.5.1 节中给出了示例) . 因此,该度量的一个更好的术语可能是“归一化成对对比指数”。
Because the nPVI is fundamentally a measure of contrast, the use of the term “variability” in its name is somewhat unfortunate, as variability and contrast are not necessarily correlated, as shown in Figure 3.9. In fact, it is quite possible to have two sequences A and B in which the variability of durations in A is greater than B, but the nPVI of durations is greater in B than of A (an example is given in section 3.5.1). Thus a better term for this measure might have been the “normalized pairwise contrastiveness index.”
我深入研究了 nPVI 的细节,因为它在语音节奏研究以及语言和音乐节奏的比较研究(在 3.5.1 中讨论)中被证明是非常富有成果的。Grabe 和 Low (2002) 使用 nPVI 检查多种语言句子中元音持续时间的模式,并表明几种传统上被归类为重读时间的语言(如德语、荷兰语、英国英语和泰语)与传统上归类为音节计时的许多其他语言(例如法语、意大利语和西班牙语)相比,它们具有更大的声乐 nPVI。这支持了 Bolinger 的观点,即元音的持续交替对重音节奏很重要。17受这项工作的启发,Ramus (2002b) 在他的数据库中测量了所有八种语言的元音 nPVI,并发现了如图 3.10所示的结果。
I have delved into the details of the nPVI because it has proven quite fruitful in the study of speech rhythm and in the comparative study of linguistic and musical rhythm (discussed in 3.5.1). Grabe and Low (2002) have used the nPVI to examine the patterning of vowel durations in sentences of a number of languages, and have shown that several languages traditionally classified as stress-timed (such as German, Dutch, British English, and Thai) have a larger vocalic nPVI than a number of other languages traditionally classified as syllable timed (such as French, Italian, and Spanish). This supports Bolinger’s idea that durational alternation of vowels is important to stress-timed rhythm.17 Inspired by this work, Ramus (2002b) measured the vowel nPVI for all eight languages in his database and found the results shown in Figure 3.10.
图 3.10绘制了发声间隔的 nPVI 与声间间隔(即辅音间隔)的 rPVI。(rPVI 或“原始成对可变性指数”的计算方式与 nPVI 相同,但分母中没有归一化项;参见本章的附录1。Grabe 和 Low [2002] 认为归一化对于辅音间隔,因为它会对音节结构中的跨语言差异进行标准化。)关注 nPVI 维度,重音计时语言(英语和荷兰语)与音节计时语言(西班牙语、意大利语和法语)分开,后者为 Bolinger 的想法提供了额外的支持。18此外,波兰语现在与重音时间语言相去甚远,这很有趣,因为有感知证据表明波兰语由于缺乏元音减少而在节奏上与这些语言不同(参见第 3.3.1 节,“感知和类型学”小节) . 然而,日语在 nPVI 方面与法语相似,这表明仅 nPVI 不足以将语言分类为传统的节奏类。然而,为辅音间隔添加 rPVI 的第二个维度确实将日语隔离开来,日语在连续的辅音间隔之间具有非常低的持续时间对比。这表明至少需要两个语音维度来捕捉节奏类之间的差异(另见 Ramus 等人,1999)。
Figure 3.10 plots the nPVI for vocalic intervals against the rPVI for intervocalic intervals (i.e., consonantal intervals). (The rPVI, or “raw pairwise variability index,” is computed in the same way as the nPVI but without the normalization term in the denominator; cf. this chapter’s appendix 1. Grabe and Low [2002] argue that normalization is not desirable for consonantal intervals because it would normalize for cross-language differences in syllable structure.) Focusing on the nPVI dimension, the stress-timed languages (English and Dutch) are separated from the syllable-timed languages (Spanish, Italian, and French), which provides additional support for Bolinger’s ideas.18 Furthermore, Polish is now far from the stress-timed languages, which is interesting because there is perceptual evidence that Polish is rhythmically different from these languages due to its lack of vowel reduction (see section 3.3.1, subsection “Perception and Typology”). Japanese is similar to French in terms of nPVI, however, suggesting that nPVI alone is not enough to sort languages into traditional rhythmic classes. Adding a second dimension of rPVI for consonantal intervals, however, does segregate out Japanese, which has very low durational contrast between successive consonantal intervals. This suggests that at least two phonetic dimensions may be needed to capture differences between rhythmic classes (see also Ramus et al., 1999).
图 3.10八种语言句子的元音 nPVI 与辅音(元音间)rPVI。(CA = 加泰罗尼亚语,DU = 荷兰语,EN = 英语,FR = 法语,IT = 意大利语,JA = 日语,PO = 波兰语,SP = 西班牙语。)误差条显示 +/- 1 个标准误差。来自 Ramus,2002a。
Figure 3.10 Vocalic nPVI versus Consonantal (intervocalic) rPVI for sentences in eight languages. (CA = Catalan, DU = Dutch, EN = English, FR = French, IT = Italian, JA = Japanese, PO = Polish, SP = Spanish.) Error bars show +/– 1 standard error. From Ramus, 2002a.
nPVI 的一个有趣的语言学应用是语音节奏的个体发育。有人声称说英语的儿童的节奏是音节节奏,与成人说话的重音节奏形成对比 (Allen & Hawkins, 1978)。格拉布等。(1999) 进行了一项 nPVI 研究支持了这个说法。他们测量了讲英语和讲法语的 4 岁儿童及其母亲在说话时元音的 nPVI。他们发现英国儿童的 nPVI 值明显低于他们的母亲,而法国儿童的 nPVI 值与他们的母亲相似。也就是说,英语和法语儿童说话时都具有音节节奏(尽管英语儿童的 nPVI 已经大于法语儿童的 nPVI)。跟踪 nPVI 作为英语和法语儿童年龄的函数,研究这两种语言的语音节奏的发展时间过程会很有趣。
One interesting linguistic application of the nPVI has been to the ontogeny of speech rhythm. It has been claimed that the rhythm of English-speaking children is syllable-timed in contrast to the stress-timed rhythm of adult speech (Allen & Hawkins, 1978). Grabe et al. (1999) conducted an nPVI study that supported this claim. They measured the nPVI of vowels in the speech of English versus French speaking 4-year-olds and their mothers. They found that English children had significantly lower nPVI values than their mothers, whereas French children resembled their mothers in having a low nPVIs. That is, both English and French children spoke with a syllable-timed rhythm (though the nPVI of the English children was already larger than that of their French counterparts). It would be interesting to track nPVI as a function of age in English and French children, to study the developmental time course of speech rhythm in the two languages.
迄今为止,所有 nPVI 研究都集中在语言中的单一节奏层:元音或辅音的时间模式。本着 Bolinger 的节奏可能涉及多个层次的时间组织的思想的精神,值得使用 nPVI 来探索语音中各种节奏相关层次的持续模式的关系(参见 Asu 和 Nolan,2006)。例如,在英语句子中,可以通过测量每个句子中的这两个量,然后计算前者与后者的比率,来计算与音节持续时间的 nPVI 相关的应力间间隔 (ISI) 的 nPVI。可能是英语等时性的主观印象部分源于 ISI 之间的持续对比低于音节之间的对比,这将使该比率显着小于 1。第 3.3.4 节。
All nPVI studies to date have focused on a single rhythmic layer in language: the temporal patterning of vowels or consonants. In the spirit of Bolinger’s idea that rhythm may involve multiple levels of temporal organization, it would be worth using the nPVI to explore the relationship of durational patterns at various rhythmically relevant levels in speech (cf. Asu & Nolan, 2006). For example, within English sentences, one could compute the nPVI of interstress intervals (ISIs) relative to the nPVI of syllable durations by measuring both of these quantities in each sentence and then taking the ratio of the former to the latter. It may be that the subjective impression of isochrony in English arises in part from a lower durational contrast between ISIs than between syllables, which would make this ratio significantly less than 1. I return to this idea in section 3.3.4.
本节回顾了语音节奏的几种不同的声学关联。由于这项工作的成功,似乎可以肯定的是,将来会提出和探索更多此类相关性(例如,Gut,2005)。最终,这些措施的有用性将取决于它们是否将被认为节奏相似的语言归为一类,并将被视为节奏不同的语言分开。因此,知觉研究是节奏类型学研究的基础,我们接下来将转向此类研究。
This section has reviewed a few different acoustic correlates of speech rhythm. Due to the success of this work, it seems certain that more such correlates will be proposed and explored in the future (e.g., Gut, 2005). Ultimately, the usefulness of such measures will depend on whether they group together languages that are perceived as rhythmically similar and divide languages perceived as rhythmically different. Perceptual studies are thus fundamental to research on rhythmic typology, and it is to such studies that we turn next.
所有语言节奏的类型学理论最终都植根于感知。过去,语言学家根据他们对语言的听觉印象定义了节奏类别(例如重音与音节时间),然后研究人员试图确定这些类别的语音和声学相关性。最近在发现传统节奏类的持续相关性方面取得的成功证明了语言学家在他们的听觉节奏判断中的直觉。然而,旧的分类系统也有其明显的缺点。例如,一些语言跨越不同的类别(例如,波兰语和加泰罗尼亚语,见上文),许多语言并不完全适合任何现有类别(Grabe & Low,2002)。因此,旧系统在接缝处开裂,需要一种新的节奏分类科学。这样一门科学必须以一组感知数据为基础,这些数据提供了一个衡量语言之间节奏的相似性和差异性。这些数据将使研究人员能够构建语言节奏的感知图,并确定人类语言节奏在多大程度上属于不同的集群(相对于形成连续体)。它还将有助于为语音节奏的声学基础的实证研究提出新的途径。
All typological theories of language rhythm are ultimately rooted in perception. In the past, linguists have defined rhythm categories (such as stress vs. syllable timing) based on their auditory impressions of languages, and then researchers have sought to identify phonological and acoustic correlates of these classes. The recent success in finding durational correlates of traditional rhythm classes is a testament to the intuition of linguists in their aural rhythmic judgments. However, it is also apparent that the old categorization system has its shortcomings. For example, some languages straddle different categories (e.g., Polish and Catalan, see above), and many languages do not fit neatly into any of the existing categories (Grabe & Low, 2002). Thus the old system is cracking at the seams, and a new science of rhythm classification is called for. Such a science must have as its foundation a body of perceptual data that provides a measure of the rhythmic similarities and differences between languages. These data will allow researchers to construct a perceptual map of language rhythms and determine to what extent the rhythms of human languages fall into distinct clusters (vs. forming a continuum). It will also help suggest new avenues for empirical research into the acoustic foundations of speech rhythm.
幸运的是,关于语言之间节奏差异的感知工作已经开始。Ramus 和 Mehler (1999) 的一项创新研究设计了一种研究语音节奏感知的方法,其假设是如果听众可以在唯一的提示是有节奏的情况下区分两种语言,那么这些语言属于不同的节奏类。语音再合成技术被用来有选择地消除语言之间的各种语音差异,并将注意力集中在节奏上。声音示例 3.6 和 3.7 说明了 Ramus 和 Mehler 在英语和日语句子上的技巧。每个句子都有四个版本,将原始句子转换为越来越抽象的元音和辅音时间模式。在第一个转换中,每个音素都被其类别中的一个特定成员替换:所有擦音都被 /s/ 取代,元音被 /a/ 取代,流音 (l & r) 被 /l/ 取代,爆破音被 /t/ 取代,鼻音被 /n/ 取代,滑音被 /ai/ 取代(他们称这种情况为“saltanaj” ,”发音为“sal-tan-ai”)。每个句子的原始语调都被保留下来。在第二个转换中,所有辅音都被 /s/ 替换,所有元音都被 /a/ 替换(他们称之为“sasasa”的情况)。在最后的转换中,语音音高被压平为单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。r) /l/,爆破音 /t/,鼻音 /n/,滑音 /ai/(他们称之为“saltanaj”,发音为“sal-tan-ai”)。每个句子的原始语调都被保留下来。在第二个转换中,所有辅音都被 /s/ 替换,所有元音都被 /a/ 替换(他们称之为“sasasa”的情况)。在最后的转换中,语音音高被压平为单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。r) /l/,爆破音 /t/,鼻音 /n/,滑音 /ai/(他们称之为“saltanaj”,发音为“sal-tan-ai”)。每个句子的原始语调都被保留下来。在第二个转换中,所有辅音都被 /s/ 替换,所有元音都被 /a/ 替换(他们称之为“sasasa”的情况)。在最后的转换中,语音音高被压平为单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。”发音为“sal-tan-ai”)。每个句子的原始语调都被保留下来。在第二个转换中,所有辅音都被 /s/ 替换,所有元音都被 /a/ 替换(他们称之为“sasasa”的情况)。在最后的转换中,语音音高被压平为单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。”发音为“sal-tan-ai”)。每个句子的原始语调都被保留下来。在第二个转换中,所有辅音都被 /s/ 替换,所有元音都被 /a/ 替换(他们称之为“sasasa”的情况)。在最后的转换中,语音音高被压平为单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。语音音调变平成单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。语音音调变平成单调,留下元音和辅音的时间模式作为语言之间的唯一区别(作者将这种情况称为“扁平 sasasa”)。Ramus 和 Mehler 发现法国成年人可以在所有三种情况下区分英语和日语,支持英语和日语的节奏确实在感知上不同的假设。
Fortunately, perceptual work on the rhythmic differences between languages has already begun. An innovative study by Ramus and Mehler (1999) devised a method for studying the perception of speech rhythm posited on the idea that if a listener can tell two languages apart when the only cues are rhythmic, then the languages belong to distinct rhythmic classes. Speech resynthesis techniques were used to selectively remove various phonetic differences between languages and focus attention on rhythm. Sound Examples 3.6 and 3.7 illustrate Ramus and Mehler’s technique on a sentence of English and Japanese. Each sentence is presented in four versions, which convert the original sentence to an increasingly abstract temporal pattern of vowels and consonants. In the first transformation, each phoneme is replaced by a particular member of its class: all fricatives replaced by /s/, vowels by /a/, liquids (l & r) by /l/, plosives by /t/, nasals by /n/, and glides by /ai/ (a condition they called “saltanaj,” pronounced “sal-tan-ai”). The original intonation of each sentence is preserved. In the second transformation, all consonants are replaced by /s/ and all vowels by /a/ (a condition they call “sasasa”). In the final transformation, the voice pitch is flattened to a monotone, leaving the temporal pattern of vowels and consonants as the only difference between the languages (a condition the authors refer to as “flat sasasa”). Ramus and Mehler found that French adults could discriminate between English and Japanese in all three conditions, supporting the hypothesis that the rhythms of English and Japanese are indeed perceptually distinct.
专注于平坦的 sasasa 转型,Ramus 等人。(2003) 还测试了法国成年人区分英语、波兰语、西班牙语和加泰罗尼亚语节奏的能力。结果表明,波兰语可以与其他语言区分开来,而加泰罗尼亚语虽然与英语和波兰语不同,但不能与西班牙语区分开来。(回想一下,从语音学的角度来看,波兰语和加泰罗尼亚语似乎介于重音时间和音节时间语言之间;参见第 3.3.1 节,“音韵学和类型学”小节。)这些感知数据表明波兰语确实属于一个单独的节奏类别比英语,而加泰罗尼亚语与西班牙语属于同一类别。这一发现对语音节奏的声学相关图具有影响,如图 3.7. 在那张地图中,波兰语与重音时间语言聚集在一起,表明需要不同的声学维度来区分感知到的节奏类。事实上,Ramus 等人。(1999, 2003) 已经注意到波兰语在他们最初的研究中可以从所有其他语言中分离出来,这个维度衡量句子中元音持续时间的可变性,AV,因为与所有其他语言相比,波兰语的元音持续时间可变性非常低语言在他们的样本。19因此,关于言语的感性工作已经表明,如果想保留语言中节奏类的概念,至少需要四个类:重音节拍、音节节拍、短音节拍(以日语为代表)和其他节拍以波兰语为代表的未命名类别。20
Focusing on the flat sasasa transformation, Ramus et al. (2003) also tested French adults’ ability to discriminate the rhythms of English, Polish, Spanish, and Catalan. The results indicated that Polish could be discriminated from the other languages, whereas Catalan could not be discriminated from Spanish, though it was distinct from English and Polish. (Recall that on phonological grounds, Polish and Catalan seemed intermediate between stress-timed and syllable-timed languages; cf. section 3.3.1, subsection “Phonology and Typology.”) These perceptual data suggest that Polish does belong in a separate rhythmic category than English, whereas Catalan belongs in the same category as Spanish. This finding has implications for maps of the acoustic correlates of speech rhythm, such as Figure 3.7. In that map, Polish clustered with stress-timed languages, indicating that a different acoustic dimension is needed to separate perceived rhythmic classes. Indeed, Ramus et al. (1999, 2003) have noted that Polish can be separated out from all other languages in their original study on a dimension that measures the variability of vowel duration in a sentence, AV, because Polish has a very low vowel duration variability compared to all other languages in their sample.19 Thus perceptual work on speech has already suggested that if one wishes to preserve the notion of rhythm classes in language, at least four classes are needed: stress-timed, syllable-timed, mora-timed (represented by Japanese), and one other yet-to-be-named category represented by Polish.20
另一种与节奏类型学有关的知觉研究集中在新生儿和婴儿身上。这种主题选择似乎令人惊讶,但这些研究的动机是非常年轻的人类对语音节奏很敏感,并用它来指导学习语言的细粒度声音模式(Mehler 等人,1996 年)。纳齐等人。(1998) 使用低通滤波语音研究新生儿节律感知。这会删除大部分语音信息,但会保留音节、重音和音高模式。他们表明,法国新生儿能够区分英语和日语,但不能区分英语和荷兰语,这表明后者是同一节奏类的成员。他们还表明,新生儿可以将英语和荷兰语与西班牙语和意大利语区分开来,但不能将英语和西班牙语与荷兰语和意大利语区分开来,表明前配对更准确地捕捉知觉节律类别(参见 Nazzi 等人,2000 年,用于将 5 个月大婴儿的研究结果趋于一致)。这些发现支持作者的假设,即婴儿可以辨别语言只有当他们属于不同的节奏类别时,他们将这一概念称为语言习得的“节奏假设”。如果这是真的,那么在未来的研究中,婴儿的耳朵可能是绘制人类语言节奏的特别重要的工具。21
Another line of perceptual research concerned with rhythmic typology has focused on newborns and infants. This choice of subjects may seem surprising, but these studies are motivated by the idea that very young humans are sensitive to speech rhythm and use it to guide learning of fine-grained sound patterns of language (Mehler et al., 1996). Nazzi et al. (1998) studied newborn rhythm perception using low-pass filtered speech. This removes most of the phonetic information but preserves syllable, stress, and pitch patterns. They showed that French newborns are able to discriminate English from Japanese, but not English from Dutch, suggesting that the latter are members of the same rhythmic class. They also showed that the newborns could discriminate English and Dutch from Spanish and Italian, but not English and Spanish from Dutch and Italian, suggesting that the former pairings more accurately capture perceptual rhythmic classes (cf. Nazzi et al., 2000, for converging findings with 5-month-old infants). These findings support the authors’ hypothesis that babies can discriminate languages only when they belong to different rhythmic classes, a notion that they dub the “rhythm hypothesis” for language acquisition. If this is true, then the ears of infants may be particularly important instruments in mapping human speech rhythms in future research.21
Ramus、Nazzi 及其同事的研究为未来对语音节奏感知的研究提出了许多要点。首先,重要的是设计刺激和任务,将注意力集中在在正常语音感知中发挥作用的语音节奏的那些方面。例如,扁平 sasasa 的一个危险条件是,当将具有高度可变音节结构的语言(例如英语)与以简单音节为主的语言(例如日语)进行比较时,所产生的扁平之间存在显着的感知差异sasasa 刺激是在前一个刺激中更频繁地出现持续时间长的 /s/ 声音(这是由于将辅音簇转换为单个长 /s/ 声音)。因此,歧视可以简单地基于对频繁的长 /s/ 声音的聆听,而不是基于对时间结构的关注。
The studies of Ramus, Nazzi, and colleagues raise a number of points for future research on the perception of speech rhythm. First, it is important to design stimuli and tasks that focus attention on those aspects of speech rhythm that play a role in normal speech perception. For example, a danger of the flat sasasa condition it that when a language with a highly variable syllable structure (such as English) is compared to a language dominated by simple syllables (such as Japanese), a salient perceptual difference between the resulting flat sasasa stimuli is the more frequent occurrence of long-duration /s/ sounds in the former stimulus (which result from transforming consonant clusters into single, long /s/ sounds). Thus discrimination could simply be based on listening for frequent long /s/ sounds rather than on attending to temporal structure.
其次,语言节奏的知觉分类研究不应仅基于判别任务,还应纳入相似性判断。节奏相似性的音乐研究为此类研究提供了一个很好的模型 (Gabrielsson, 1973, 1993)。在这项研究中,节奏以成对的方式呈现,听众使用数字量表评估他们感知到的相似性。使用多维尺度研究由此产生的评级,以揭示听众在对节奏进行分类时使用的感知维度。这种范式可以很容易地适应研究语音节奏,使用低通滤波语音与最小的音调变化作为刺激。这些研究应该对节奏的重要感知维度可能是相关的这一想法敏感,例如,第 3.3.1 节,“持续时间和类型学”小节)。
Second, research on the perceptual taxonomy of language rhythms should not only be based on discrimination tasks, but should also incorporate similarity judgments. Musical studies of rhythmic similarity provide a good model for such research (Gabrielsson, 1973, 1993). In this research, rhythms are presented in a pairwise fashion and listeners rate their perceived similarity using a numerical scale. The resulting ratings are studied using multidimensional scaling to uncover perceptual dimensions used by listeners in classifying rhythms. This paradigm could easily be adapted to study speech rhythm, using low-pass filtered speech with minimal pitch variation as stimuli. Such studies should be sensitive to the idea that the important perceptual dimensions for rhythm may be relational, for example, a high contrastiveness between successive syllable durations while simultaneously having a lower durational contrastiveness between interstress intervals (cf. the end of section 3.3.1, subsection “Duration and Typology”).
最后,所有未来节奏类型学研究的一个基本问题是,语言之间感知到的节奏相似性和差异在多大程度上取决于听众的母语。重音、音节和拍子时间理论是由母语为英语的人提出的,其他语言的人是否以与英语人相同的方式感知节奏线索是一个悬而未决的问题。例如,最近的研究表明,法语听众在区分仅重音位置不同的无意义词时存在一些困难,而西班牙语听众则没有这种困难(Dupoux 等人,2001 年)。这可能反映了西班牙语具有对比重音的事实:两个单词可以具有相同的音素但具有不同的重音模式,这可以完全改变单词的含义(例如,sábana vs. sabána,分别表示“床单”和“大草原”;比照。Soto-Faraco 等人,2001 年)。法语没有这个属性,Dupoux 等人。表明这种差异是导致他们在法语听众中发现的“压力性耳聋”的原因。这样的结果提出了一个基本问题:是否存在不同语言之间感知到的节奏相似性和差异的单一地图,或者地图的地理位置是否根据听众的母语而有所不同?只有实证工作才能解决这个问题,但母语似乎确实有可能影响对语言之间节奏关系的感知。
Finally, a fundamental issue for all future studies of rhythmic typology is the extent to which perceived rhythmic similarities and differences between languages depend on the native language of the listener. The theory of stress, syllable, and mora timing was proposed by native English speakers, and it is an open question whether speakers of other languages perceive rhythmic cues in the same way that English speakers do. For example, it has recently been demonstrated that French listeners have some difficulty distinguishing nonsense words that differ only in the location of stress, whereas Spanish listeners have no such difficulty (Dupoux et al., 2001). This likely reflects the fact that Spanish has contrastive stress: Two words can have the same phonemes but a different stress pattern, and this can change the meaning of the word entirely (e.g., sábana vs. sabána, which mean “sheet” and “savannah” respectively; cf. Soto-Faraco et al., 2001). French does not have this property, and Dupoux et al. suggest that this difference is responsible for the “stress deafness” they found in their French listeners. Results such as this raise a fundamental question: Is there a single map of perceived rhythmic similarities and differences among languages, or does the geography of the map differ according to the native language of the listener? Only empirical work can resolve this issue, but it seems a real possibility that native language influences the perception of rhythmic relations between languages.
对于那些对比较语言和音乐中的节奏感兴趣的人来说,熟悉理论语言学的一个分支是很重要的,这个分支被称为“格律”音系学。” 韵律音韵学处理语音节奏,但它以一种与目前所描述的方法完全不同的方式来处理。首先,节奏突出被视为等级。也就是说,prominence 是在每个级别上递增分配的根据系统原则的韵律层次结构。例如,在给定的理论中,可能所有音节都以基本的突出量开头,然后为每个单词(或附加组)的词汇重读音节分配额外的突出度,然后是短语级突出度被添加到短语的特定单词(例如,英语中的“nuclear stress rule”),等等。在这种观点下,突出性不仅仅是音节有或没有的称为“重音”的二元语音特征。更确切地说,突出度是话语层次韵律结构的声学投射,因此,有几个程度用于指示音节在话语的节奏层次中的位置 (Halle & Vergnaud, 1987; Halle & Idsardi, 1996; Shattuck- Hufnagel 和特克,1996 年)。22
For those interested in comparing rhythm in language and music, it is important to be familiar with a branch of theoretical linguistics known as “metrical phonology.” Metrical phonology deals with speech rhythm, but it does so in a manner quite different from the approaches described so far. First and foremost, rhythmic prominence is treated as hierarchical. That is, prominence is incrementally assigned at each level of the prosodic hierarchy according to systematic principles. For example, in a given theory it may be the case that all syllables begin with a basic amount of prominence, then the lexically stressed syllable of each word (or clitic group) is assigned an additional degree of prominence, then a phrase-level prominence is added to a particular word of a phrase (e.g., the “nuclear stress rule” in English), and so on. In this view, prominence is not simply a binary phonetic feature called “stress” that syllables either have or do not. Rather, prominence is an acoustic projection of the hierarchical prosodic structure of an utterance, and as such, has several degrees that serve to indicate a syllable’s position in an utterance’s rhythmic hierarchy (Halle & Vergnaud, 1987; Halle & Idsardi, 1996; Shattuck-Hufnagel & Turk, 1996).22
Selkirk 1984 年出版的著作Phonology and Syntax: The Relation Between Sound and Structure中对格律音系学进行了最清晰的阐述。本书的一个目标是展示一个人如何从一串单词以规则支配的方式表示口语话语的音节突出模式。音节的相对突出使用“格律网格”来表示,该网格将每个音节视为抽象时间中的一个点(图 3.11),这意味着在考虑突出模式时不考虑它们的确切时间。
One of the clearest expositions of metrical phonology is in Selkirk’s 1984 book, Phonology and Syntax: The Relation Between Sound and Structure. One goal of this book is to show how one can go from a string of words to a representation of the syllabic prominence pattern of the spoken utterance in a rule-governed fashion. The relative prominence of syllables is represented using a “metrical grid” that treats each syllable as a point in abstract time (Figure 3.11), meaning that prominence patterns are considered without regard to their exact timing.
Liberman (1975) 介绍的语言格律网格的两个方面体现了“语音的节奏组织与音乐的节奏组织非常相似的说法”(Selkirk,1984:9)。首先,如上所述,突出性被分层处理,类似于音乐节拍的分层理论 (Cooper & Meyer, 1960; Lerdahl & Jackendoff, 1983)。在音节的基本层次之上是其他几个层次。第二级标记重音音节,是基本“节拍”的级别,类似于音乐中的节奏(Selkirk,1984:10、40)。第三层标记每个单词的主要词汇重音,第四层标记每个短语的主要重音。这种节拍的“文本到网格”分配提供了节奏原则运作的输入。这些原则代表了与音律的第二个联系,每个级别层次结构。这些原则由可以添加、删除和移动节拍的规则强制执行,以使每个级别的模式与交替模式更加一致。例如,“节拍添加”规则可能会在第二级添加一个节拍,以避免出现一长串非重读音节。在第三级,“beat movement”的规则可能会改变单词的主要重音/重音以避免与主要词汇重音/重音相邻(例如当“thirtéen”变成“thírteen mén”时;Liberman and Prince, 1977, Shattuck -Hufnagel 等人,1994 年;Grabe 和 Warren,1995 年)。理想的目标是任何给定级别的强节拍被该级别的不超过两个弱节拍分隔(“节奏交替原则”)。因此,韵律音韵学使用直接受音乐韵律理论启发的想法推导出句子的突出模式。
Two aspects of the linguistic metrical grid, introduced by Liberman (1975), embody “the claim that the rhythmic organization of speech is quite analogous to that of music” (Selkirk, 1984:9). First, as noted above, prominence is treated hierarchically, analogously to hierarchical theories of musical meter (Cooper & Meyer, 1960; Lerdahl & Jackendoff, 1983). Above the basic level of the syllable are several other levels. The second level marks stressed syllables, and is the level of the basic “beat,” in analogy to the tactus in music (Selkirk, 1984:10, 40). The third level marks the primary lexical stress of each word, and the fourth level marks the main accent of each phrase. This “text-to-grid” assignment of beats provides the input on which rhythmic principles operate. These principles, which represent the second link to musical meter, amount to a tendency to alternate between stronger and weaker elements at each level of the hierarchy. The principles are enforced by rules that can add, delete, and move beats to make the pattern at each level more congruous with an alternating pattern. For example, a rule of “beat addition” might add a beat at the second level to avoid a long series of unstressed syllables. At the third level, a rule of “beat movement” might shift the primary stress/accent of word to avoid the adjacency of primary lexical stress/accent (as when “thirtéen” becomes “thírteen mén”; Liberman and Prince, 1977, Shattuck-Hufnagel et al.,1994; Grabe & Warren, 1995). The ideal goal is that strong beats at any given level are separated by no more than two weak beats at that level (the “principle of rhythmic alternation”). Thus metrical phonology derives the prominence pattern of a sentence using ideas directly inspired by theories of musical meter.
图 3.11英语句子的韵律网格。来自塞尔柯克,1984 年。
Figure 3.11 A metrical grid for a sentence of English. From Selkirk, 1984.
语音具有多个节奏相关级别的概念是语言和音乐节奏之间有趣的抽象相似性,因为音乐节拍的一个基本属性是在多个时间尺度上存在感知显着的时间模式(参见第 3.2.2 节)。此外,正如音乐节拍涉及节拍以下至少一个心理上可理解的节奏级别和节拍以上一两个级别,韵律音韵学提出重音音节“节拍”下方和上方的节奏级别。也就是说,这两种理论都涉及多个时间尺度的时间间隔模式。
The notion that speech has multiple rhythmically relevant levels is an interesting abstract similarity between rhythm in language and music, because a fundamental property of musical meter is the existence of perceptually salient temporal pattering on multiple timescales (cf. section 3.2.2). Furthermore, just as musical meter involves at least one psychologically accessible rhythmic level below the beat and one or two levels above it, metrical phonology proposes rhythmic levels below and above the “beat” of stressed syllables. That is, both theories concern the patterning of time intervals at several timescales.
尽管韵律音韵学描绘的图画是一幅优雅的图画,但应该指出的是,它的主张绝不是语音科学家普遍接受的 (Cooper & Eady, 1986),而且它提出的突出模式通常是根据直觉构建的语言学家,而不是来自实验室环境中收集的声学和感知数据。然而,希望该领域可以建立在实证基础上。例如,有语音证据表明语音中有四个突出度(至少在重读语言中),对应于缩元音、全元音、重读音节和重读音节 (Terken & Hermes, 2000)。就目前的目的而言,韵律音位学之所以有趣,是因为它引起了人们对语言和音乐节奏比较具有指导意义的许多问题的关注。这些问题之一(节奏结构中的多层)导致了对语音和音乐节奏进行实证比较研究的想法,并在第 3.3.4 节。下面讨论另外两个问题。
Although the picture painted by metrical phonology is an elegant one, it should be noted that its claims are by no means universally accepted by speech scientists (Cooper & Eady, 1986), and that the patterns of prominence it proposes are typically constructed from the intuitions of linguists rather than from acoustic and perceptual data collected in laboratory settings. However, there is hope that the field can be put on an empirical footing. For example, there is phonetic evidence for four degrees of prominence in speech (at least in stress-timed languages), corresponding to reduced vowels, full vowels, stressed syllables, and accented syllables (Terken & Hermes, 2000). For the current purposes, metrical phonology is interesting because it draws attention to a number of issues in which comparisons of linguistic and musical rhythm are instructive. One of these issues (multiple layering in rhythmic structure) leads to ideas for empirical comparative studies of rhythm in speech and music, and is discussed further in section 3.3.4. Two other issues are discussed below.
尽管韵律音位学提出的层次结构受到西方音乐的启发,但音乐韵律和语言韵律之间的一些非常重要的差异是显而易见的。最值得注意的是,音律中的时间周期性比语音中发现的任何东西都严格得多,这种差异具有显着的认知后果。音乐的规则周期性允许节拍作为声音感知的心理框架,这样即使在身体上非常虚弱的事件(如切分节奏)也可以被认为在节拍上突出。相比之下,语言的突出性不够规则,无法允许像切分音这样抽象的东西。因此,语言格律网格不是抽象的周期性心理模式(如音乐格律网格),而只是听到的突起图,充满了时间上的不规则性。例如,Dauer (1983) 报告说语音中的平均重音间间隔约为 450 毫秒,标准偏差约为 150 毫秒。将标准偏差除以平均值得出大约 33% 的变异系数。这种可变性与音乐明显不同,在音乐中,感知到的节拍以更均匀间隔的方式出现。例如,当敲击音乐时,成年人表现出大约 5% 的变异系数。因此,语言中的“韵律网格”也许应该被称为“突出网格”,以避免暗示抽象的心理周期性。这种可变性与音乐明显不同,在音乐中,感知到的节拍以更均匀间隔的方式出现。例如,当敲击音乐时,成年人表现出大约 5% 的变异系数。因此,语言中的“韵律网格”也许应该被称为“突出网格”,以避免暗示抽象的心理周期性。这种可变性与音乐明显不同,在音乐中,感知到的节拍以更均匀间隔的方式出现。例如,当敲击音乐时,成年人表现出大约 5% 的变异系数。因此,语言中的“韵律网格”也许应该被称为“突出网格”,以避免暗示抽象的心理周期性。
Although the hierarchies posited by metrical phonology were inspired by Western music, some very important differences between the meters of music and language are readily apparent. Most notably, temporal periodicity in musical meter is much stricter than anything found in speech, and this difference has dramatic cognitive consequences. The regular periodicities of music allow meter to serve as a mental framework for sound perception, such that an event can be perceived as metrically prominent even if is physically quite weak, as in syncopated rhythms. By comparison, the prominences of language are not regular enough to allow for anything as abstract as syncopation. As a result, linguistic metrical grids are not abstract periodic mental patterns (like musical metrical grids) but are simply maps of heard prominences, full of temporal irregularities. For example, Dauer (1983) reports that the average interstress interval in speech was around 450 ms, with a standard deviation of approximately 150 ms. Dividing the standard deviation by the mean yields a coefficient of variation of about 33%. This variability is markedly different from music, in which perceived beats occur in a much more evenly spaced fashion. For example, when tapping to music, adults show a coefficient of variation of about 5%. Thus “metrical grids” in language should perhaps be called “prominence grids,” to avoid the implication of an abstract mental periodicity.
撇开时间周期性的问题不谈,人们可以问,语音和音乐是否具有更抽象的相似性,即倾向于以强元素和弱元素的交替模式排列突出物。如果是这样,这可能表明语言节奏和音乐之间存在基本的认知关系。支持语言交替原则的证据来自研究表明说英语的人会调整突出模式以使其更规则。例如,Kelly 和 Bock (1988) 让说话者说出嵌入句子中的无意义单词,例如:
Setting aside questions of temporal periodicity, one can ask if speech and music share a more abstract similarity in terms of a tendency to arrange prominences in alternating patterns of strong and weak elements. If so, this might suggest a basic cognitive relationship between rhythm in language and music. Evidence in favor of a principle of alternation in language comes from studies showing that English speakers adjust prominence patterns to make them more regular. For example, Kelly and Bock (1988) had speakers pronounce nonsense words embedded in sentences, such as:
(3.12a) 完整的 teplez 减少了。
(3.12a) The full teplez decreased.
(3.12b) 把 teplez 扔得很糟糕。
(3.12b) Throw the teplez badly.
感兴趣的焦点是说话者是否强调了无意义单词的第一个或第二个音节。总的来说,说话者倾向于重读第一个音节,这与英语中双音节名词有首重音的普遍趋势一致。然而,当无意义词前面有重读音节(如句子 3.12a)时,这种趋势明显减弱,就好像说话者想要分散重读的发生一样。交替的进一步证据言语产生中重音模式的研究来自 Cutler (1980),他检查了说话者无意中遗漏了一个音节的句子,例如:
The focus of interest was whether speakers stressed the first or second syllable of the nonsense word. Overall, speakers tended to stress the first syllable, in accordance with a general trend in English for disyllabic nouns to have initial stress. However, this tendency was significantly weaker when the nonsense word was preceded by a stressed syllable (as in sentence 3.12a), as if speakers wanted to spread out the occurrence of stresses. Further evidence for alternation of stress patterns in speech production comes from Cutler (1980), who examined sentences in which speakers inadvertently omitted a syllable, such as:
与预期的句子:
versus the intended sentence:
比起偶然性,这些错误往往会缩短一长串非重读音节,因此往往会促进重读音节和非重读音节的交替。
Much more often than chance, the errors served to shorten a long run of unstressed syllables, thus tending to promote the alternation of stressed and unstressed syllables.
尽管这些发现似乎支持一个积极的交替原则,但它们也可能反映了一个消极原则的作用,该原则试图打破突出音节的集群或非突出音节的集群(即“重音冲突”和“重音缺失” ”,内斯波尔和沃格尔,1989 年)。对后一种观点的一些支持来自于观察,即 Kelly 和 Bock(1988)以及 Cutler(1980)报告的正则化趋势实际上相当薄弱。在前一项研究中,在大多数情况下,受试者将初始重音放在目标无意义词上,无论紧接在前的音节是否被重读。先前重音音节的存在只是将初始重音的比例从 80% 降低到 70%,这表明在语音中只有轻微的趋势来保持交替重音模式。相似地,
Although these findings seem to support a positive principle of alternation, it is also possible that they reflect the action of a negative principle that seeks to break up clusters of prominent syllables or clusters of nonprominent syllables (i.e., “stress clashes” and “stress lapses,” Nespor & Vogel, 1989). Some support for the latter view comes from the observation that the regularizing tendencies reported by Kelly and Bock (1988) and Cutler (1980) are actually rather weak. In the former study, subjects placed initial stress on the target nonsense word in the majority of cases, whether or not the immediately preceding syllable was stressed. The presence of a prior stressed syllable simply lowered the proportion of initial stress from 80% to 70%, suggesting only a mild tendency to maintain an alternating stress pattern in speech. Similarly, Cutler’s study is based on collecting relatively rare syllable omission errors, meaning that speakers usually manage quite well with irregular prominence patterns.
因此,目前不可能排除这样的假设,即语音中强音节和弱音节交替的趋势是非节奏力量的结果,这些力量力求使突出音之间保持舒适的距离。事实上,对希腊语的研究表明交替突出性甚至可能不是人类语言的普遍模式,因为希腊语容忍长序列的非重读音节 (Arvaniti, 1994)。因此,关于语言中突出模式的唯一普遍原则可能是太靠近的突出会受到避免冲突的语言机制的影响(参见 Arvaniti,1994 年,来自希腊语的证据;和 Nespor,1990 年,参考研究多种语言的冲突避免机制)。这种机制存在的原因最终可能植根于发音的机制。与非重读音节相比,重读音节的下巴运动往往更大(de Jong,1995),并且在以正常谈话的典型快速速度说话时避免将更大的下巴运动聚集在一起可能在生物力学上是有利的。
Thus at the current time it is impossible to rule out the hypothesis that the tendency to alternate stronger and weaker syllables in speech is the result of nonrhythmic forces that seek to keep prominences at a comfortable distance from each other. In fact, research on Greek suggests that the alternation of prominence may not even be a universal pattern for human languages, because Greek tolerates long sequences of unstressed syllables (Arvaniti, 1994). Thus it may be that the only universal principle regarding prominence patterns in language is that prominences that are too close together are subject to linguistic mechanisms for clash avoidance (see Arvaniti, 1994, for evidence from Greek; and Nespor, 1990, for references to studies of clash avoidance mechanisms in numerous languages). The reason such mechanisms exist may ultimately be rooted in the mechanics of articulation. Stressed syllables tend to be made with larger jaw movements than unstressed syllables (de Jong, 1995), and it may be biomechanically advantageous to avoid crowding larger jaw movements together when speaking at the fast rates typical of normal conversation.
节奏在言语感知中的作用一直是至少四个不同研究方向的焦点。其中两个与音乐节奏有明显的概念联系:对语音中感知等时性的研究和对节奏可预测性在感知中的作用的研究。第三行研究了语音节奏在从连接的语音中感知分割单词中的作用。虽然起初并不明显,但这项研究实际上与语言和音乐节奏的比较感知研究非常相关。最后(也是最近)的工作涉及节奏在感知非母语口音中所起的作用。虽然这项工作与音乐研究之间还没有建立任何概念上的联系,但还是对其进行了简要描述,因为它是语音节奏实证工作的一个有前途的新领域。
The role of rhythm in speech perception has been the focus of at least four different lines of research. Two of these have obvious conceptual connections to musical rhythm: the study of perceived isochrony in speech and the investigation of the role of rhythmic predictability in perception. The third line has investigated the role of speech rhythm in the perceptual segmentation of words from connected speech. Although not obvious at first, this research is in fact quite pertinent to comparative perceptual studies of rhythm in language and music. The final (and most recent) line of work concerns the role that rhythm plays in perception of nonnative accents. Although no conceptual link has yet been made between this work and music research, it is briefly described because it is a promising new area for empirical work on speech rhythm.
如第 3.3.1 节所述(“周期性和类型学”小节),语言节奏涉及规则的时间间隔(例如,重音或音节之间)的观点没有得到语音测量的经验支持。然而,所有这些测量都是基于语音生成的数据,换句话说,是基于语音的波形或频谱图。在一篇有影响力的论文中,Lehiste (1977) 提出了一个有趣的建议,即周期性在感知中可能比在生产中更强。也就是说,耳朵在判断语音的周期性时可能会忽略或补偿表面的不规则性。她将这一想法基于实证工作,在实证工作中,她检查了听众在四个 ISI 的短句中识别最长或最短应力间间隔 (ISI) 的能力,并对句子的非语音类似物执行相同的任务,其中重音被点击替换,语音被噪音替换。她发现听众在非语言条件下表现更好,并建议如果听众难以判断语音中的 ISI 持续时间差异,这会导致 ISI 在持续时间上相似的感觉,换句话说,即等时性印象。(当然,可能不是语音本身使微小的持续时间差异难以检测;可能是语言中语义的存在占据了听者的注意力,因此很难做出精细的持续时间判断。因此,这是未来的重要控制条件这类研究是使用听众不熟悉的语言。)她发现听众在非语言条件下表现更好,并建议如果听众难以判断语音中的 ISI 持续时间差异,这会导致 ISI 在持续时间上相似的感觉,换句话说,即等时性印象。(当然,可能不是语音本身使微小的持续时间差异难以检测;可能是语言中语义的存在占据了听者的注意力,因此很难做出精细的持续时间判断。因此,这是未来的重要控制条件这类研究是使用听众不熟悉的语言。)她发现听众在非语言条件下表现更好,并建议如果听众难以判断语音中的 ISI 持续时间差异,这会导致 ISI 在持续时间上相似的感觉,换句话说,即等时性印象。(当然,可能不是语音本身使微小的持续时间差异难以检测;可能是语言中语义的存在占据了听者的注意力,因此很难做出精细的持续时间判断。因此,这是未来的重要控制条件这类研究是使用听众不熟悉的语言。)换句话说,等时性的印象。(当然,可能不是语音本身使微小的持续时间差异难以检测;可能是语言中语义的存在占据了听者的注意力,因此很难做出精细的持续时间判断。因此,这是未来的重要控制条件这类研究是使用听众不熟悉的语言。)换句话说,等时性的印象。(当然,可能不是语音本身使微小的持续时间差异难以检测;可能是语言中语义的存在占据了听者的注意力,因此很难做出精细的持续时间判断。因此,这是未来的重要控制条件这类研究是使用听众不熟悉的语言。)
As noted in section 3.3.1 (subsection “Periodicity and Typology”), the idea that linguistic rhythm involves regular temporal intervals (e.g., between stresses or syllables) has received no empirical support from measurements of speech. However, all such measurements have been based on data from speech production, in other words, on waveforms or spectrograms of spoken utterances. In an influential paper, Lehiste (1977) made the interesting suggestion that periodicity may be stronger in perception than in production. That is, the ear may ignore or compensate for surface irregularities in judging periodicity in speech. She based this idea on empirical work in which she examined the ability of listeners to identify the longest or shortest interstress interval (ISI) in short sentences of four ISIs, and to do the same task on nonspeech analogs of the sentences in which the stresses were replaced by clicks and the speech by noise. She found that listeners performed better in the nonlinguistic condition, and suggested that if listeners had difficulty judging ISI duration differences in speech, this would lead to a sense that ISIs were similar in duration, in other words, an impression of isochrony. (Of course, it may not be speech per se that makes small duration differences difficult to detect; it could be that the presence of semantic meaning in language preoccupies the listener’s attention so that fine duration judgments are difficult. Thus an important control condition for future studies of this sort is to use a language with which the listener is unfamiliar.)
Lehiste 继续研究四个充满噪声的间隔序列的持续时间的最小可察觉差异 (JND),推断这将建立对语音中 ISI 的 JND 的保守估计。她在噪声序列中使用了三个基本参考持续时间(300、400 和 500 毫秒)。在每个序列中,四个间隔中的三个具有相同的持续时间,第四个以九个 10 毫秒的步长增加或减少。她发现,确定一个间隔比其他间隔长或短的可靠判断需要两者之间的变化30 和 100 毫秒。她认为,语音中 ISI 的 JND 并不比这更好,而且可能更糟,因此等时性的物理测量需要考虑这种“感知容忍度”(参见 Kristofferson,1980)。
Lehiste went on to study the just noticeable difference (JND) in duration for sequences of four noise-filled intervals, reasoning that this would establish a conservative estimate of the JNDs for ISIs in speech. She used three basic reference durations in her noise sequences (300, 400, and 500 ms). In each sequence, three of the four intervals had the same duration, and the fourth was increased or decreased in nine 10-ms steps. She found that reliable judgments identifying one interval as longer or shorter than the others required changes of between 30 and 100 ms. She argued that JNDs for ISIs in speech are no better than this and are likely to be worse, and thus that physical measurements of isochrony need to take this “perceptual tolerance” into account (cf. Kristofferson, 1980).
Lehiste 的工作很有趣,因为它提出了一种可能性,即听众听到的等时性比语音中实际存在的要多。支持这一论点的一些证据来自 Donovan 和 Darwin (1979),他们让人们听英语句子,然后通过敲击来模仿每个句子重音模式的时间。受试者还使用一系列噪音来执行这项任务,这些噪音的时间模仿了句子的重音模式。关键的发现是,在模仿语音时,受试者敲击的时间变异性低于重读音节的实际时间,而在模仿噪音时,他们没有表现出这种模式。
Lehiste’s work is interesting because it raises the possibility that listeners hear more isochrony than is really there in speech. Some evidence offered in favor of this argument comes from Donovan and Darwin (1979), who had individuals listen to English sentences and then imitate the timing of each sentence’s stress pattern by tapping. The subjects also performed this task with sequences of noises whose timing mimicked the stress pattern of sentences. The critical finding was that when imitating speech, subjects tapped with less temporal variability than the actual timing of stressed syllables, whereas when imitating noise they did not show this pattern.
尽管这些发现很有趣,但进一步的研究表明这种范式可能存在缺陷。斯科特等人。(1985) 复制了 Donovan 和 Darwin 对英语的发现,但也发现受试者表现出对法语(不被认为具有周期性重音)以及乱码的敲击的规律化。这表明观察到的正则化可能是记住声学复杂刺激与噪声模式相比更困难的副作用。
Although these findings are intriguing, further work has suggested that this paradigm may be flawed. Scott et al. (1985) replicated the findings of Donovan and Darwin for English, but also found that subjects showed regularization of tapping to French (which is not considered to have periodic stress), as well as to garbled speech. This suggests that the observed regularization may be a side consequence of the greater difficulty of remembering acoustically complex stimuli versus noise patterns.
因此,Lehiste 的想法值得进一步研究,因为它们提出了感知时间模式如何与语音中测量的物理间隔相关的重要问题。然而,Lehiste 的著作中没有任何内容支持这样的观点,即在一般情况下语音被认为是同步的。如第 3.3.2 节(“语言和音乐格律网格之间的差异”小节)所述,语音中 ISI 持续时间的可变性约为 33%。对于 500 毫秒的平均 ISI,这是 150 毫秒,高于 Lehiste 建议的主观等时性阈值。
Thus Lehiste’s ideas merit further investigation, because they raise the important issue of how perceived timing patterns relate to the physical intervals measured in speech. Nevertheless, there is nothing in Lehiste’s work that supports the idea that speech is perceived as isochronous under ordinary circumstances. As noted in section 3.3.2 (subsection “Differences Between Linguistic and Musical Metrical Grids”), the variability of ISI durations in speech is on the order of 33%. For a 500-ms average ISI, this is 150 ms, which is above the threshold for subjective isochrony suggested by Lehiste.
Lehiste 的想法在语音和音乐的比较研究方面确实指向了一个有趣的方向,即直接比较用于检测感知等时序列(例如,重复的音节“ta ta ta ta”)中的时间不规则性的阈值与重复的音乐等效声学复杂性的声音。23如果语音比音乐更能容忍持续时间的变化,并且听起来仍然是同步的,这就提出了关于两个领域中时间感知的不同机制的有趣问题。
Lehiste’s ideas do point in one interesting direction in terms of comparative studies of speech and music, namely a direct comparison of the threshold for detecting temporal irregularities in perceptually isochronous sequences (e.g., a repeating syllable “ta ta ta ta”) versus a repeating musical sound of equivalent acoustic complexity.23 If speech can tolerate more durational variability than music and still sound isochronous, this raises interesting questions about different mechanisms for time perception in the two domains.
还可以检查用于检测语音和音乐中同步序列速度变化的阈值。目前的数据表明,对于非音乐家的听众,音乐声音序列中节奏变化检测的阈值为 5%-8%(Drake & Botte,1993 年;参见 Rivenez 等人,2003 年)。如果使用语音,阈值会更高吗?
One could also examine the threshold for detecting tempo change in isochronous sequences in speech and music. Current data suggest that for nonmusician listeners, the threshold for tempo change detection in sequences of musical sounds is 5%-8% (Drake & Botte, 1993; cf. Rivenez et al., 2003). Would the threshold be higher if speech sounds were used?
许多研究人员认为,预测英语中重读音节位置的能力在感知上是有益的(例如,Martin,1972 年;Shields 等人,1974 年;Cutler & Foss,1977 年)。这个想法背后的推理是基于某些假设:重音音节携带重要的语义信息,而听众的注意力是有限的,因此将注意力资源集中在出现重音的时间点上是有用的。因此,预测压力位置的能力可以帮助以有效的方式引导注意力。这个想法暗示了语音和音乐中节奏感知之间的联系点,因为音乐心理学中有将节奏和注意力联系起来的理论(Jones,1976;Large & Jones,1999;Barnes & Jones,2000)。然而,为了确定这是否真的是一个有意义的平行线,必须回答两个问题。首先,是否有证据表明节奏可预测性在言语感知中起着重要作用?其次,语音和音乐的节奏预测机制是否相似?
A number of researchers have argued that the ability to predict the location of stressed syllables in English is perceptually beneficial (e.g., Martin, 1972; Shields et al., 1974; Cutler & Foss, 1977). The reasoning behind this idea is based on certain assumptions: Stressed syllables carry important semantic information, and a listener’s attention is limited, so that it is useful to expend attentional resources on those points in time where stresses occur. Thus the ability to anticipate stress location can help guide attention in an efficient manner. This idea suggests a point of contact between rhythm perception in speech and music, because there are theories linking rhythm and attention in music psychology (Jones, 1976; Large & Jones, 1999; Barnes & Jones, 2000). In order to determine if this is really a meaningful parallel, however, two questions must be answered. First, is there evidence that rhythmic predictability plays an important role in speech perception? Second, are the mechanisms for rhythmic prediction similar in speech and music?
节奏可预测性在语音感知中的作用的最佳证据来自使用音素监控任务的研究。在这些实验中,听众被告知一次听一个句子,并在听到目标音素(例如 /d/)时按下按钮。Cutler 和 Darwin (1981) 进行了一项研究,其中记录了对给定目标词具有高、低或中性强调的句子。例如,下面的句子 3.14a 和 b 用于记录对“污垢”一词的高度重视和低度强调(在下面的句子中,句子主要强调的单词用斜体表示):
The best evidence for a role of rhythmic predictability in speech perception comes from studies using phoneme-monitoring tasks. In these experiments, listeners are told to listen to one sentence at a time and press a button when they hear a target phoneme (such as /d/). Cutler and Darwin (1981) conducted a study in which sentences were recorded that had high, low, or neutral emphasis on a given target word. For example, sentences 3.14a and b below were used to record high versus low emphasis on the word “dirt” (in the sentences below, the word bearing the main emphasis of the sentence is italicized):
(3.14a) 她设法清除了地毯上的污垢,但没有清除草渍。
(3.14a) She managed to remove the dirt from the rug, but not the grass stains.
(3.14b) 她设法清除了地毯上的污垢,但没有清除他们衣服上的污垢。
(3.14b) She managed to remove the dirt from the rug, but not from their clothes.
卡特勒和达尔文随后将目标词的中性版本拼接成高强调和低强调的句子,这样目标音素 /d/(以及以该音素开头的单词的其余部分)在这两种情况下在听觉上是相同的。对高强调句子中目标音素的更快反应时间表明压力预测正在影响语音处理。这正是所发现的:听众在检测高重音句子中的目标音素方面确实更快。特别有趣的是,即使从两种句子类型中去除基频变化,这种差异仍然存在,这表明持续时间和幅度的模式足以预测即将到来的压力。
Cutler and Darwin then spliced the neutral version of the target word into high and low emphasis sentences, so that the target phoneme /d/ (and the rest of the word that began with this phoneme) were acoustically identical in the two cases. A faster reaction time to the target phoneme in high-emphasis sentences would indicate that the prediction of stress was influencing speech processing. This is precisely what was found: Listeners were reliably faster in detecting the target phoneme in high-stress sentences. Of particular interest is that this difference persisted even when fundamental frequency variation was removed from the two sentence types, suggesting that patterns of duration and amplitude were sufficient to predict the upcoming stress.
卡特勒和达尔文的研究集中在目标词上,这些词要么承受或不承受整个句子的主要对比重音。也就是说,他们不是在研究对任何重读音节的感知,而是在研究句子中特别突出的重读音节。Pitt 和 Samuel (1990) 进行了一项研究,其中上下文操作不是那么极端:他们使用的句子是由于节奏和句法因素,在目标点预测的重音或非重音,例如,单词“permit”的第一个音节:
Cutler and Darwin’s study focused on target words that either did or did not bear the main contrastive stress of the entire sentence. That is, they were not studying the perception of just any stressed syllable, but of a particularly salient stressed syllable in a sentence. Pitt and Samuel (1990) conducted a study in which the context manipulation was not so extreme: They used sentences that predicted stress or nonstress at a target point due to rhythmic and syntactic factors, for example, the first syllable of the word “permit” in:
(3.15a) 警卫询问访客是否有进入大楼的许可证。
(3.15a) The guard asked the visitor if she had a permit to enter the building.
3.15b) 服务员决定不允许其他人进入餐厅。
3.15b) The waiter decided he could not permit anyone else in the restaurant.
在句子 3.15a 中,上下文导致人们在目标位置期待重读音节,这既是因为句子的句法预测了一个名词(英语中倾向于以重读音节开头的词类),也是因为节奏原因句子中的前重音距离很远(“visitor”的第一个音节)。在句子 3.15b 中,上下文导致人们不预测重读音节,这既是因为句法预测的是动词(英语中倾向于以弱音节开头的词类),也是因为先前的重音非常接近(在“可以”或“不”)。
In sentence 3.15a, the context leads one to expect stressed syllable at the target location, both because the syntax of the sentence predicts a noun (a word category that tends to start with a stressed syllable in English) and for the rhythmic reason that the prior stress in the sentence is quite far away (the first syllable of “visitor”). In sentence 3.15b, the context leads one not to predict a stressed syllable, both because the syntax predicts a verb (a word category that tends to start with a weak syllable in English), and because the prior stress is quite nearby (on “could” or “not”).
与 Cutler 和 Darwin 一样,Pitt 和 Samuel 使用拼接技术来确保物理目标词在两个上下文中是相同的,并要求听众在听到目标音素时做出反应(例如,上例中的 /p/)。然而,与 Cutler 和 Darwin 不同的是,他们发现目标音素的反应时间与前面的上下文没有显着差异。因此,看起来虽然节奏可以帮助听众预测句子水平的强调,但它在预测词汇重音方面并没有发挥重要作用,即使在句法加强时也是如此。这对节奏在引导人们注意口语中的大多数重读音节方面起着重要作用的观点提出了一些质疑。显然,需要做更多的工作来确定在正常情况下压力在多大程度上是可预测的。
Like Cutler and Darwin, Pitt and Samuel used a splicing technique to ensure that the physical target word was the same in the two contexts, and asked listeners to respond when they heard a target phoneme (e.g., /p/ in the above example). Unlike Cutler and Darwin, however, they found no significant difference in reaction time to the target phoneme as function of the preceding context. Thus it appears that although rhythm may help listeners predict sentence level emphasis, it does not play a strong role in predicting lexical stress, even when reinforced by syntax. This casts some doubt on the idea that rhythm plays an important role in guiding attention to the majority of stressed syllables in spoken sentences. Clearly, more work is needed to determine to what extent stress is predictable under normal circumstances.
然而,即使证明了语音节奏预测的重要作用,语音和音乐中节奏预测的基础机制也很可能完全不同。在音乐中,节奏的可预测性反映了时间间隔的周期性结构。在语音中,节奏可预测性的基础(例如,预测压力何时发生)不太可能涉及周期性时间间隔,因为没有证据表明正常语音中存在此类间隔。研究语音节奏预测基础的第一步是研究 Cutler 和 Darwin (1981) 使用的刺激,尤其是那些去除了基频变化的刺激。哪些时间和/或振幅模式有助于引导听众对该研究的期望?
Even if a significant role for rhythmic prediction in speech is demonstrated, however, it is quite possible that the mechanisms that underlie rhythmic prediction in speech and music are quite different. In music, rhythmic predictability reflects the periodic structure of temporal intervals. In speech, the basis for rhythmic predictability (e.g., predicting when a stress will occur) is unlikely to involve periodic time intervals, because there is no evidence that such intervals exist in normal speech. A first step in studying the basis of rhythmic prediction in speech would be to study the stimuli used by Cutler and Darwin (1981), especially those stimuli in which fundamental frequency variation was removed. What temporal and/or amplitude patterns helped guide listeners’ expectations in that study?
因此,目前,通过将注意力引导到话语中语义重要的部分,语音中的节奏可预测性带来优势的假设并没有得到经验证据的充分支持。在音乐中,节奏的可预测性显然具有适应性价值:它允许形成在音乐感知(例如,节拍感知)中起重要作用的时间预期方案,并指导合奏表演的协调和舞蹈中的动作。因为说话没有规律的节拍,节奏的可预测性会起到什么作用?Lehiste (1977) 提出的一个想法是它在指示短语边界方面发挥作用在演讲中。具体来说,她建议说话者消除句法歧义句子的一种方法是通过延长应力间间隔 (ISI) 来发出结构边界信号。例如,她研究了说话者造出的句子,例如“The old men and women stayed at home”,这在句法上是模棱两可的(要么只是男人老了,要么男人和女人都老了)。她发现,当说话者以这样一种方式说出这句话以明确一种或另一种解释时,“男人和女人”的序列在持续时间上有很大不同,当“男人”和“女人”之间有句法界限时,这个序列的持续时间要长得多” 此外,她进行了一项后续研究,其中使用单调重新合成相同的句子,关键 ISI 的持续时间是通过统一扩展其中音素的持续时间来控制的,因此片段的相对持续时间保持不变。她发现听众能够仅根据关键 ISI 的长度来感知预期的含义,这表明 ISI 持续时间可以表示短语边界。
Thus at the current time, the hypothesis that rhythmic predictability in speech confers an advantage by guiding attention to semantically important parts of utterances is not well supported by empirical evidence. In music, it is clear that rhythmic predictability has an adaptive value: it allows the formation of a temporal expectancy scheme that plays an important role in musical perception (e.g., beat perception), and it guides the coordination of ensemble performance and the synchronization of movements in dance. Because speech does not have a regular beat, what functional role would rhythmic predictability play? One idea suggested by Lehiste (1977) is that it plays a role in signaling phrase boundaries in speech. Specifically, she suggested that one method speakers have for disambiguating syntactically ambiguous sentences is by signaling a structural boundary via lengthening of an interstress interval (ISI). For example, she studied speakers’ productions of sentences such as “The old men and women stayed at home,” which is syntactically ambiguous (either just the men were old or both the men and women were old). She found that when speakers said this sentence in such a way to make one or the other interpretation clear, the sequence “men and women” was very different in duration, being substantially longer when a syntactic boundary was intended between “men” and “women.” Furthermore, she conducted a follow-up study in which the same sentences were resynthesized using a monotone, and the duration of the critical ISI was manipulated by uniformly expanding the duration of the phonemes within it, so that the relative durations of segments remained the same. She found that listeners were able to perceive the intended meaning solely on the basis of the length of the critical ISI, suggesting that ISI duration can signal a phrase boundary.
Scott (1982) 随后的一项研究着手检验 Lehiste 的假设,以验证短语边界由短语末尾延长表示的更传统的概念。她发现了 Lehiste 假说的一个弱版本的证据,因为在某些情况下,听众似乎使用了 ISI 模式,但在其他情况下,他们依赖于传统的短语结尾延长。然而,证据足以表明这一研究系列值得进一步研究。然而,一个关键的概念点是,ISI 持续时间在创建语音感知边界方面发挥作用的证据并不等同于等时性证据。对 ISI 应该持续多长时间的期望不需要基于对等时性的期望,但可以基于对给定 ISI 应该给出多长时间、其中的音节数量(和类型)以及当前语速的预期(参见 Campbell,1993)。根据这种观点,当 ISI 明显长于预期时,更有可能听到韵律中断,节奏可预测性只是 ISI 持续时间与 ISI 中音节数量和类型之间统计关系的隐含知识。这将允许在不求助于等时性概念的情况下发挥节奏可预测性的功能作用。节奏的可预测性只是 ISI 持续时间与 ISI 中音节的数量和类型之间统计关系的隐含知识。这将允许在不求助于等时性概念的情况下发挥节奏可预测性的功能作用。节奏的可预测性只是 ISI 持续时间与 ISI 中音节的数量和类型之间统计关系的隐含知识。这将允许在不求助于等时性概念的情况下发挥节奏可预测性的功能作用。
A subsequent study by Scott (1982) set out to test Lehiste’s hypothesis against the more conventional notion that phrase boundaries are signaled by phrase-final lengthening. She found evidence for a weak version of Lehiste’s hypothesis, in that there appeared to be some cases in which listeners used ISI patterns, but others in which they relied on traditional phrase-final lengthening. Nevertheless, the evidence was suggestive enough for this line of research to merit further study. A key conceptual point, however, is that evidence that ISI duration plays a role in creating perceived boundaries in speech is not equivalent to evidence for isochrony. The expectation for how long an ISI should be need not be based on an expectation for isochrony, but could be based on expectations for how long a given ISI should be given the number (and type) of syllables within it and the current speech rate (cf. Campbell, 1993). According to this view, a prosodic break is more likely to be heard when an ISI is significantly longer than expected, and rhythmic predictability is simply implicit knowledge of the statistical relation between ISI duration and the number and type of syllables in an ISI. This would allow a functional role for rhythmic predictability without any recourse to notions of isochrony.
对于母语听众来说,口语句子由一连串离散的单词组成,但这种看法是一种错觉。正如第 2 章所指出的,语言中的词边界不会以任何简单的方式映射到语音信号中的声学中断,而且任何听过外语句子的人都可以证明,词边界的位置远非显而易见在相关的演讲中。这个问题与婴儿尤其相关,他们经常面临多词表达 (van de Weijer, 1999) 并且没有现有词汇表的好处是帮助他们识别一个词在哪里结束,下一个词在哪里开始。
To a native listener, spoken sentences consist of a succession of discrete words, yet this perception is an illusion. As pointed out in Chapter 2, word boundaries in language do not map in any simple way onto acoustic breaks in the speech signal, and as anyone who has listened to sentences in a foreign language can attest, it is far from obvious where the word boundaries in connected speech are. This problem is particularly relevant for infants, who are constantly faced with multiword utterances (van de Weijer, 1999) and who do not have the benefit of an existing vocabulary to help them identify where one word ends and the next begins.
心理语言学的大量研究表明,语言的节奏特性有助于听众对语音进行分段。例如,关于英语的工作指出了一种基于重音的分割策略:听众期望强音节是词首。这可能反映了英语词典中具有初始重音的单词的优势 (Cutler & Carter, 1987),并以多种不同的感知方式表现出来。例如,Cutler 和 Butterfield (1992) 表明,当听众错断语音时,他们倾向于将单词边界放在重读音节之前,例如当“通过松散类比”被误听为“by Luce and Allergy”时。此外,当要求说英语的人识别嵌入较大的多音节无意义单词中的真实单音节单词时,当真正的单词不跨越两个重读音节时,他们会发现它更容易。例如,“薄荷”在“mintef”中比在“mintayf”中更容易被发现,这大概是因为在后一个词中,强烈的第二个音节“tayf”触发了分割,从而将“mint”分成两部分(Cutler & Norris, 1988)。Cutler (1990) 将在每个强音节处设置词首的策略称为“韵律分割策略”。
A substantial body of research in psycholinguistics indicates that the rhythmic properties of a language assist a listener in segmenting speech. Work on English, for example, has pointed to a segmentation strategy based on stress: Listeners expect strong syllables to be word-initial. This likely reflects the predominance of words with initial stress in the English lexicon (Cutler & Carter, 1987), and manifests itself in a number of different ways in perception. For example, Cutler and Butterfield (1992) showed that when listeners missegment speech, they tend to place word boundaries before stressed syllables, as when “by loose analogy” is misheard as “by Luce and Allergy.” Furthermore, when English speakers are asked to spot real monosyllabic words embedded in larger polysyllabic nonsense words, they find it easier when the real word does not straddle two stressed syllables. “Mint,” for example, is easier to spot in “mintef” than in “mintayf,” presumably because in the latter word the strong second syllable “tayf” triggers segmentation, thus splitting “mint” into two parts (Cutler & Norris, 1988). Cutler (1990) has dubbed the strategy of positing a word onset at each strong syllable the “metrical segmentation strategy.”
对其他语言切分的研究表明,基于压力的切分绝不是普遍的。例如,讲法语和讲西班牙语的人喜欢基于音节的分割(Mehler 等人,1981 年;Pallier 等人,1993 年),而讲日语的人喜欢拼音分割(Otake 等人,1993 年)。因此,分割依赖于在母语中语音重要的单位。这项跨语言研究的一个惊人发现是,即使在听外语时也会应用母语的切分策略,表明分段倾向不仅仅是对特定语音节奏的反应,而是听众的感知习惯(参见 Cutler,2000 年的评论)。卡特勒提出的一种可能性是,这种习惯是早期语言学习的残余,当时有节奏的分割在引导词汇习得中发挥了重要作用。
Research on segmentation in other languages has revealed that stress-based segmentation is by no means universal. French and Spanish speakers, for example, favor syllabically based segmentation (Mehler et al., 1981; Pallier et al., 1993), whereas Japanese speakers favor moraic segmentation (Otake et al., 1993). Thus segmentation relies on units that are phonologically important in the native language. One striking finding of this cross-linguistic research is that the native language’s segmentation strategies are applied even when listening to a foreign language, showing that segmentation tendencies are not simply a reaction to a particular speech rhythm, but a perceptual habit of a listener (see Cutler, 2000, for a review). One possibility suggested by Cutler is that this habit is a residue of early language learning, when rhythmic segmentation played an important role in bootstrapping lexical acquisition.
这项研究与语言和音乐比较研究的相关性在于,它表明,在分割语音模式方面,对语言节奏的体验会对听众产生永久影响,无论这些模式是否来自母语。从这一观察来看,这只是询问母语经验是否会影响一个人如何分割非语言节奏模式的第一步。这个问题将在下面的第 3.5.2 节中讨论。
The relevance of this research to comparative studies of language and music is that it shows that experience with a language’s rhythm leaves a permanent influence on a listener in terms of segmenting speech patterns, whether or not these patterns come from the native language. From this observation it is but one step to ask if experience with the native language influences how one segments non-linguistic rhythmic patterns. This question is taken up in section 3.5.2 below.
当一个人听自己的母语时,他/她通常会敏锐地感觉到说话时是否带有母语口音。最近对语音节奏的研究利用了这一事实,让听众判断非母语人士所说的话语中感知到的“外国口音”的程度。然后对不同非母语人士的语音进行经验性节奏测量。通过检查外国口音的感知程度与定量节奏测量之间的相关性,研究人员希望确定听众在衡量语音节奏模式时使用的感知线索。
When a person listens to their native language, s/he usually has a keen sense of whether or not it is being spoken with a native accent. Recent research on speech rhythm has taken advantage of this fact by having listeners judge the degree of perceived “foreign accentedness” in utterances spoken by nonnative speakers. Empirical rhythmic measurements are then taken of speech of the different non-native speakers. By examining the correlation between perceived degree of foreign accent and the quantitative rhythmic measures, researchers hope to identify the perceptual cues listeners use in gauging speech rhythm patterns.
使用这种方法,White 和 Mattys(2007 年)检查了说英语的西班牙语人士,发现他们在句子中的元音持续时间变化越大,英语人士对他们的评价越像母语。这可能反映了元音减少:学习减少非重读音节中元音的西班牙语使用者(英语的特征,而不是西班牙语的特征;参见第 3.3.1 节,“音系学和类型学”小节)听起来更像母语人士。句子中元音减少的结果是元音持续时间的可变性增加,因为一些元音变得非常短。
Using this approach, White and Mattys (2007) examined Spanish speakers of English and found that the greater their vowel duration variability within sentences, the more native-sounding they were rated by English speakers. This probably reflects vowel reduction: Spanish speakers who learn to reduce vowels in unstressed syllables (a characteristic of English, but not of Spanish; cf. section 3.3.1, subsection “Phonology and Typology”) are likely to sound more like native speakers. A consequence of vowel reduction within sentences is that vowel duration variability increases, because some vowels become very short.
如第 3.3.1 节(“持续时间和类型学”小节)所述,另一个受元音减少影响的节奏的经验测量是 nPVI,它测量句子中相邻元音之间的持续时间对比程度,而不是总体持续时间变化。White 和 Mattys 发现说西班牙语的英语母语者的元音 nPVI 与他们的母语发音呈正相关。然而,至关重要的是,与 nPVI 相比,元音持续时间变异性是重音判断的更好预测指标。这表明元音持续时间的可变性可能比持续时间的对比在感知上与语音节奏更相关。
As noted in section 3.3.1 (subsection “Duration and Typology”), another empirical measure of rhythm influenced by vowel reduction is the nPVI, which measures the degree of durational contrast between adjacent vowels in a sentence rather than overall durational variability. White and Mattys found that the vowel nPVI of Spanish speakers of English was positively correlated with how native they sounded. Crucially, however, vowel duration variability was a better predictor of accent judgment than was nPVI. This suggests that vowel duration variability may be more perceptually relevant for speech rhythm than durational contrastiveness.
这是一种非常有前途的方法,因为可以将不同的节奏措施相互比较,看看哪种方法最能预测感知数据。然而,由于存在不受控制的变量,迄今为止的研究结果必须被认为是暂时的。这是非母语人士准确地产生第二语言音素(即各个元音和辅音)的程度的可变性。在判断说话者的外国口音程度时,听众几乎肯定会根据音段和超音段线索的某种组合做出判断。这是一个问题,因为一些非母语人士可能会产生听起来像母语的韵律,但会产生听起来不是母语的片段材料,反之亦然。为了使问题复杂化,不同的听众在判断一个给定的非母语者听起来有多“外国”时,他们对音段线索和超音段线索的加权程度可能会有所不同。因此,要真正将听众的注意力集中在节奏上,片段提示必须统一。再合成技术,例如 Ramus 及其同事使用的技术(参见第 3.3.1 节,“感知和类型学”小节)可能提供一种方法,使不同非母语人士所说的句子在音素材料方面保持一致,同时保留韵律差异。
This is a very promising approach, because different rhythmic measures can be pitted against each other to see which best predicts perceptual data. However, the findings to date must be considered tentative because of an uncontrolled variable. This is variability in the degree to which nonnative speakers accurately produce the phonemes of the second language (i.e., the individual vowels and consonants). When judging a speaker’s degree of foreign accent, listeners almost certainly base their judgments on some combination of segmental and suprasegmental cues. This is a problem because some nonnative speakers may produce native-sounding prosody but nonnative sounding segmental material, or vice versa. To compound the problem, different listeners may vary in the extent to which they weight segmental versus suprasegmental cues in judging how “foreign” a given nonnative speaker sounds. Thus to truly focus listeners’ attention on rhythm, segmental cues must be made uniform. Resynthesis techniques, such as those used by Ramus and colleagues (cf. section 3.3.1, subsection “Perception and Typology”) might provide one way to make sentences spoken by different nonnative speakers uniform in terms of phonemic material while preserving prosodic differences.
尽管语音节奏研究的历史与周期性概念(例如,重音或音节的等时性)紧密相关,但证据上面的评论表明,语音周期性的案例非常薄弱。因此,语音节奏研究的进展需要在概念上将“节奏”和“周期性”分离,这一点在本章的介绍中提出。很明显,语音在声音的系统时间、重音和分组模式的意义上具有节奏,并且语言在这些模式方面可以相似或不同。然而,语言的节奏并不是基于任何语言单位的周期性出现。相反,图案主要是副产品音位现象,如音节结构、元音减少、词汇突出位置、重音冲突避免和句子的韵律措辞。这些现象导致话语在时间上的组织方式不同。
Although the history of speech rhythm research is tightly bound up with notions of periodicity (e.g., the isochrony of stresses or syllables), the evidence reviewed above suggests that the case for periodicity in speech is extremely weak. Thus progress in the study of speech rhythm requires conceptually decoupling “rhythm” and “periodicity,” a point made in the introduction of this chapter. It is quite clear that speech has rhythm in the sense of systematic temporal, accentual, and grouping patterns of sound, and languages can be similar or different in terms of these patterns. However, the rhythms of language are not based on the periodic occurrence of any linguistic unit. Instead, the patterning is largely the by-product of phonological phenomena, such as the structure of syllables, vowel reduction, the location of lexical prominence, stress clash avoidance, and the prosodic phrasing of sentences. These phenomena lead to differences in the way utterances are organized in time.
语言中的节奏主要是结果而不是构造的概念与音乐中的节奏形成鲜明对比,在音乐中节奏和重音的模式是有意识设计的焦点。语音节奏和音乐节奏之间的另一个显着差异与语音节奏缺乏周期性框架有关,即语音节奏不会向听众传达任何运动感(参见第 3.2.5 节)。这些差异是否意味着语言和音乐的节奏无法进行有意义的比较?绝对不。如下文第 3.5 节所示,实证比较不仅是可能的,而且可能非常富有成效。然而,它们与周期性无关。
The notion that rhythm in language is primarily consequence rather than construct stands in sharp contrast to rhythm in music, in which patterns of timing and accent are a focus of conscious design. Another salient difference between rhythm in speech and music, related to the lack of a periodic framework for speech rhythm, is the fact that speech rhythm conveys no sense of motion to a listener (cf. section 3.2.5). Do these differences mean that rhythm in language and music cannot be meaningfully compared? Absolutely not. As demonstrated in section 3.5 below, empirical comparisons are not only possible, they can be quite fruitful. They have had nothing to do with periodicity, however.
对于那些对节奏的跨领域研究感兴趣的人来说,令人振奋的是,人们对语音产生和语音感知中的节奏的实证研究重新产生了兴趣(Ramus 等人,1999 年;Ramus 和 Mehler,1999 年;Low 等人。 , 2000; Grabe & Low, 2002; Lee & Todd, 2004, White & Mattys, 2007),还有很大的工作空间。例如,需要更多关于听众从节奏的角度判断外国说话者的话语是否母语的经验数据。此类研究需要采用创造性的方法将语音节奏与语言的其他语音维度隔离开来,例如,使用可以完全控制语音内容和音高轮廓的再合成语音(参见 Ramus & Mehler,1999)。还需要研究在多个语言级别测量时间模式并量化级别之间的关系。语音节奏的重要感知维度可能是相关的,例如在相邻音节持续时间之间具有高度对比,同时在重音间隔持续时间之间具有低程度对比(与此想法相关的一些数据在本节结束)。这是一个语言学家和音乐研究人员之间的合作特别有用的领域。例如在相邻音节持续时间之间具有高度对比,同时在重音间隔持续时间之间具有低程度对比(与此想法相关的一些数据在本节末尾给出)。这是一个语言学家和音乐研究人员之间的合作特别有用的领域。例如在相邻音节持续时间之间具有高度对比,同时在重音间隔持续时间之间具有低程度对比(与此想法相关的一些数据在本节末尾给出)。这是一个语言学家和音乐研究人员之间的合作特别有用的领域。
For those interested in cross-domain studies of rhythm, it is heartening to note that there is renewed interest in empirical studies of rhythm in speech production and speech perception (Ramus et al., 1999; Ramus & Mehler, 1999; Low et al., 2000; Grabe & Low, 2002; Lee & Todd, 2004, White & Mattys, 2007), and that there is much room for further work. For example, there is a need for more empirical data on listeners’ judgments of how native-sounding a foreign speakers’ utterances are, from the standpoint of rhythm. Such studies will need to employ creative ways of isolating the rhythm of speech from other phonetic dimensions of language, for example, using resynthesized speech in which phonetic content and pitch contours can be completely controlled (cf. Ramus & Mehler, 1999). There is also a need for studies that measure temporal patterning at multiple linguistic levels and that quantify relations between levels. It may be that important perceptual dimensions of speech rhythm are relational, such as having a high degree of contrast between adjacent syllable durations while simultaneously having a low degree of contrast between the duration of interstress intervals (some data pertinent to this idea are given at the end of this section). This is an area in which collaborations between linguists and music researchers would be especially useful.
在本节的其余部分,我想考虑为什么周期性一直是(并将继续是)语音节奏研究中如此持久的概念。下面我为这一历史现象提供几个原因。
In the remainder of this section, I would like to consider why periodicity has been (and continues to be) such an enduring concept in speech rhythm research. Below I offer several reasons for this historical phenomenon.
最简单的原因当然是错误地认为节奏是周期性的,或者节奏是强弱节拍之间有规律的交替,而不是作为声音的系统时间、重音和短语模式的更广泛的节奏概念,无论这种模式是否是周期性的。事实上,人们无需超越音乐就能看出,将节奏定义为周期性或强弱节拍交替过于简单化:许多广泛传播的音乐形式缺乏这些特征中的一个和/或另一个,但却有节奏地组织(参见第3.2)。
The simplest reason, of course, is the mistaken notion that rhythm is periodicity, or that rhythm is a regular alternation between strong and weak beats, rather than the broader notion of rhythm as systematic temporal, accentual, and phrasal patterning of sound, whether or not this patterning is periodic. Indeed, one need not look beyond music to see that a definition of rhythm as periodicity or as strong-weak beat alternation is overly simplistic: Many widespread musical forms lack one and/or the other of these features yet are rhythmically organized (cf. section 3.2).
周期性概念经久不衰的第二个原因可能是它在语音感知中具有有用的功能,例如使显着信息可以及时预测。有听觉感知的心理学理论提出,当事件在时间上可预测时,注意力可以更有效地分配,基于听觉注意力采用与外部节奏模式同步的内部振荡过程的想法(例如,琼斯,1976 年;Large & Jones, 1999). 这些理论为那些对语音的周期性是感知自适应的想法感兴趣的人提供了一个基本原理。或者,那些对周期性感兴趣的人可能会声称它很有用,因为它创建了一个框架,在该框架内偏差是有意义的。第 3.3.3 节,“节奏可预测性在语音感知中的作用”小节)。这些基于感知的周期性论证的主要缺点是它们的证据非常薄弱。尽管需要进一步研究,但目前的证据表明,周期性在正常语音感知中没有发挥重要作用。这应该不足为奇:语音的理解应该对显着事件的时间变化具有鲁棒性,因为这种变化的发生可能有多种原因。例如,说话者可能会在对话中出于修辞原因突然加快或放慢速度。在这种情况下,依靠周期性来理解似乎是一种适应不良的策略。
The second reason that the notion of periodicity has endured may be the idea that it has a useful function in speech perception, such as making salient information predictable in time. There are psychological theories of auditory perception that propose that attention can be allocated more efficiently when events are temporally predictable, based on the idea that auditory attention employs internal oscillatory processes that synchronize with external rhythmic patterns (e.g., Jones, 1976; Large & Jones, 1999). Such theories provide a rationale for those interested in the idea that periodicity in speech is perceptually adaptive. Alternatively, those interested in periodicity might claim that it is useful because it creates a framework within which deviations are meaningful. This is the basis of Lehiste’s idea that lengthening of interstress intervals in English can be used to mark phrase boundaries (cf. section 3.3.3, subsection “The Role of Rhythmic Predictability in Speech Perception”). The principal drawback of these perception-based arguments for periodicity is that the evidence for them is very weak. Although further research is needed, the current evidence suggests that periodicity does not have an important role to play in normal speech perception. This should not be surprising: The comprehension of speech should be robust to variation in the timing of salient events, because such variations can occur for a number of reasons. For example, a speaker may suddenly speed up or slow down for rhetorical reasons within a conversation. Under such conditions, relying on periodicity for comprehension seems a maladaptive strategy.
周期性的第三个原因可能是相信由于人类生理学中的各种时间模式(例如,心跳、步行、咀嚼)表现出周期性结构,语音也可能是周期性的,甚至可能受节奏模式发生器的支配。然而,将节奏神经回路用于语音并不是特别合理。语言中不断使用新的话语意味着每次产生新句子时,发音者必须以不同的方式进行协调。此外,产生特定语音的动作取决于它们发生的当地环境。因此,无法以高精度预先预测语音的运动模式。没有刻板的运动模式,进化就没有理由将语言控制置于有节奏的神经回路中。这里的一个类比是多指触摸打字,一种涉及多个关节(手指)重叠运动顺序的行为。虽然盲打是高度时间组织的,但产生的序列并非基于周期性运动。
The third reason for periodicity’s allure may be the belief that because various temporal patterns in human physiology (e.g., heartbeat, walking, chewing) exhibit periodic structure, speech is also likely to be periodic, perhaps even governed by rhythmic pattern generators. However, the use of rhythmic neural circuits for speech is not particularly plausible. The constant use of novel utterances in language means that articulators must be coordinated in different ways each time a new sentence is produced. Furthermore, the maneuvers that produce particular speech sounds depend on the local context in which they occur. Thus the motor patterns of speech cannot be predicted in advance with a high degree of precision. Without stereotyped movement patterns, evolution has no grounds for placing the control of speech in a rhythmic neural circuit. An analogy here is to multifingered touch-typing, a behavior involving the sequencing of overlapping movements of multiple articulators (the fingers). Although touch-typing is highly temporally organized, the resulting sequences are not based on periodic movements.
到目前为止,我一直在关注语音周期性概念持续存在的负面原因。我现在将简要地推测这个概念持续存在的积极原因,换句话说,为什么语音中的周期性对语音研究人员来说是一个直观的吸引人的概念,尤其是那些母语是英语的人。(值得注意的是,语言周期性的概念是由讲英语的语言学家提出的,并且至少从 18 世纪开始就存在英语中重音等时性的争论;参见 Abercrombie,1967:171;Kassler,2005) . 首先,说英语的人似乎发现应力间期 (ISI) 是语音中的一个显着时间单位。例如,在初步研究中,Cummins (2002) 要求英语听众用外部节奏提示及时重复诸如“manning the middle”之类的无意义短语,以便两个重读音节在感知上与周期性重复的双音模式对齐(例如,“man”将与高音对齐,“中”与低音对齐;参见 Cummins & Port, 1998)。这相当于将 ISI 的开始和结束与两个音调对齐。康明斯还测试了意大利语和西班牙语的使用者,因为这些语言具有词汇重音,允许以类似于英语短语的方式构建短语(例如西班牙语中的“BUSca al MOto”,重音由大写表示)。康明斯观察到,虽然说英语的人很快就学会了任务并且准确地完成了任务,但说西班牙语和意大利语的人花费了更长的时间,并且对这项任务感到不舒服,并在他们的结果中产生了很大的可变性。康明斯认为,这种差异是由于 ISI 不是意大利语和西班牙语使用者的显着感知单位,尽管这些语言中存在词汇重音。
So far I have focused on negative reasons for the persistence of the concept of periodicity in speech. I will now briefly speculate on the positive reasons for the persistence of this concept, in other words, why periodicity in speech has been such an intuitively appealing notion to speech researchers, particularly those whose native language is English. (It is notable that the idea of periodicity in speech was promulgated by linguists who were English speakers, and that arguments for stress isochrony in English have been present since at least the 18th century; cf. Abercrombie, 1967:171; Kassler, 2005). First, it seems that English speakers find the interstress interval (ISI) to be a salient temporal unit in speech. For example, in a preliminary study, Cummins (2002) asked English listeners to repeat nonsense phrases such as “manning the middle” in time with an external pacing cue, so that the two stressed syllables were perceptually aligned with a periodically repeating two-tone pattern (for example, “man” would align with a high tone, and “mid” with a low tone; cf. Cummins & Port, 1998). This is equivalent to aligning the start and end of the ISI with two tones. Cummins also tested speakers of Italian and Spanish because these languages have lexical stress, permitting phrases to be constructed in a manner analogous to the English phrases (such as “BUSca al MOto” in Spanish, stress indicated by capitalization). Cummins observed that although English speakers learned the task quickly and performed accurately, Spanish and Italian speakers took much longer, were uncomfortable with the task, and produced a great deal of variability in their results. Cummins suggests that this difference is due to the fact that the ISI is not a salient perceptual unit for speakers of Italian and Spanish, despite the fact that there is lexical stress in these languages.
这一有趣的发现提出了一个问题,即为什么 ISI 对英语听众来说很重要。它是否起到某种功能性的语言作用?如第 3.3.3 节所述(“节奏可预测性在语音感知中的作用”小节),ISI 持续时间可能在向英语听众发出语言边界信号方面发挥作用,即使 ISI 不是等时的。目前没有足够的证据可以自信地说明 ISI 扮演什么角色,但让我们暂时假设说英语的人和听众对它作为一个实体很敏感。鉴于对英语 ISI 持续时间的巨大可变性的实证观察(例如,变异系数约为 33%;Dauer,1983),为什么听众会觉得 ISI 是同步的?一个答案可能涉及 ISI 持续时间与音节持续时间相比的相对变异程度。如上所述,等时性的印象可能部分是由于连续的 ISI 持续时间相对于连续的音节持续时间之间的对比度较低。例如,考虑图 3.12显示了与图 3.8a和声音示例 3.5a相同的句子(“the last con cert gi ven at the opera was a tre men dous success ”)。
This intriguing finding raises the question of why ISI is salient to English listeners. Does it play some functional linguistic role? As discussed in section 3.3.3 (subsection “The Role of Rhythmic Predictability in Speech Perception”), ISI duration may play a role in signaling linguistic boundaries to English listeners, even if ISIs are not isochronous. Currently there is not enough evidence to say confidently what role the ISI plays, but let us assume for a moment that English speakers and listeners are sensitive to it as an entity. Given the empirical observations about the large variability in ISI duration in English (e.g., coefficients of variation around 33%; Dauer, 1983), why would listeners ever feel that ISIs were isochronous? One answer may concern the relative degree of variability in ISI durations compared to syllable durations. As suggested above, the impression of isochrony may be due in part to a lower degree of contrast between successive ISI durations relative to successive syllable durations. For example, consider Figure 3.12, which shows the same sentence as Figure 3.8a and Sound Example 3.5a (“the last concert given at the opera was a tremendous success”).
音节边界用垂直线标记(如图 3.8 所示),但现在重读音节(上面用粗体表示)用星号标记。星号位于重读音节的元音起始位置,换句话说,靠近其感知音节(其“P 中心”;Morton 等人,1976 年;Patel 等人,1999 年)。在这句话中,音节时长的 nPVI 为 59.3,ISI 的 nPVI 为 28.4,使得 nPVI ISI /nPVI syll之比等于 0.48。因此,在这种特殊情况下,相邻 ISI 之间的持续对比量仅为相邻音节之间的持续对比量的大约一半。
Syllable boundaries are marked with vertical lines (as in Figure 3.8), but now stressed syllables (indicated in by boldface above) have been marked with an asterisk. The asterisk was placed at the vowel onset of the stressed syllable, in other words, near its perceptual attack (its “P-center”; Morton et al., 1976; Patel et al., 1999). In this sentence, the nPVI of syllable durations is 59.3, and the nPVI of ISIs is 28.4, making the ratio nPVIISI/nPVIsyll equal to 0.48. Thus in this particular case, the amount of durational contrast between adjacent ISIs is only about half of that between adjacent syllables.
图 3.12图 3.8a的英文句子,重音标有星号 (*)。
Figure 3.12 The English sentence of Figure 3.8a, with stresses marked by asterisks (*).
我从 Ramus 的数据库中计算了 20 个英式英语句子中每一个的相似比率。24对于 20 个句子中的 15 个,该比率小于 1。20 个句子的总体平均比率为 .83 (std = .45),并且显着小于 1 ( p < .0001 by a one-tailed t -测试)。当计算 ISI 持续时间可变性与音节持续时间可变性的比率(使用变异系数,即 CV ISI /CV syll)时,观察到更强烈的效果。这里 20 个句子中有 17 个的值小于 1,平均值是 .69 (std = .25),再次显着小于 1。因此,如果耳朵对音节和重音水平的时间模式敏感,ISI 的低持续时间变异性相对于音节持续时间的可变性可能有助于产生重音在时间上有规律的感觉。当然,为了使这种解释具有任何价值,必须证明 ISI 与音节可变性的比率将重读时间与音节时间区分开来语言。意大利语和西班牙语等语言将是检验这一假设的良好候选者,因为它们是可以可靠地标记重音的音节计时语言。
I have computed similar ratios for each of the 20 sentences of British English from Ramus’s database.24 For 15 out of 20 sentences, this ratio was less than 1. The overall mean ratio across the 20 sentences was .83 (std = .45), and was significantly less than 1 (p < .0001 by a one-tailed t-test). An even stronger effect was observed when one computes the ratio of ISI duration variability to syllable duration variability (using the coefficient of variation, i.e., CVISI/CVsyll) Here 17 out of 20 sentences had a value less than 1, and the mean was .69 (std = .25), again significantly less than 1. Thus if the ear is sensitive to temporal patterning at the levels of both syllables and stresses, the low durational variability of ISIs relative to the variability of syllable durations might contribute to a sense that stresses are temporally regular. Of course, for this explanation to have any merit it must be shown that the ratio of ISI to syllable variability differentiates stress-timed from syllable-timed languages. Languages such as Italian and Spanish would be good candidates for testing this hypothesis, because they are syllable-timed languages in which stress can be reliably marked.
尽管语音等时性的概念继续困扰着研究人员,但我怀疑它在未来几年对语音节奏最富有成果的研究中将几乎没有作用或没有作用。等时性对语音节奏研究的诞生很重要,但它是一个用处已尽的概念。现在是时候转向更丰富的语音节奏视图了。
Although the notion of isochrony in speech continues to beguile researchers, I suspect that it will have little or no role in the most fruitful research on speech rhythm in the coming years. Isochrony was important to the birth of speech rhythm studies, but it is a concept whose usefulness is exhausted. It is time to move on to a richer view of speech rhythm.
与本书的其余部分一样,本章的重点是将普通演讲与器乐进行比较。然而,如果不讨论诗歌和歌曲,就不会完整地比较语言和音乐的节奏。在这些艺术形式中,文字经过精心挑选,并有意识地设计出节奏效果。当然,诗歌和歌曲只是众多有组织节奏的声乐体裁中的两种。在美国,非裔美国人教堂的某些布道风格以其节奏模式而著称,在小马丁路德金著名的“我有一个梦想”演讲中可以听到这种风格。在其他文化中,可以识别节奏设计在其中发挥作用的多种语言类型(请参阅 Agawu,1995,了解非洲社会中受节奏调节的语言形式范围的精彩案例研究)。
As in the rest of this book, the focus of this chapter is on comparing ordinary speech to instrumental music. However, no comparison of rhythm in language and music is complete without a discussion of poetry and song. In these art forms, words are carefully chosen and consciously patterned for rhythmic effect. Of course, poetry and song are but two of numerous vocal genres with organized rhythms. In the United States certain styles of preaching in African American churches are notable for their rhythmic patterning, a taste of which can be heard in Martin Luther King Jr.’s famous “I Have a Dream” speech. In other cultures, it is possible to identify numerous genres of speech in which rhythmic design plays a role (see Agawu, 1995, for a fascinating case study of the range of rhythmically regulated forms of speech in an African society). The focus here is on poetry and song, however, because these have received the greatest amount of empirical research in terms of rhythm.
诗歌节奏的研究一直是文学学者大量研究的重点(介绍见格罗斯,1979 年;福塞尔,1979 年;霍兰德,2001 年)。在这个传统中,诗歌的“韵律”指的是控制诗歌时间结构的抽象模式方案,而“节奏”指的是持续时间和重音的实际模式。例如,大量的英语诗歌是用抑扬格五音步写成的,这是一种由 5 个抑扬格脚组成的诗歌形式,其中抑扬格是一个(弱 + 强)音节模式。自然地,在抑扬格五音步诗歌中,这种模式有很多例外:一个特别常见的例子是在一行的开头替换一个 trochaic 脚,或(强 + 弱)音节模式。因此,特定行的节奏可能会违反诗歌的整体韵律。
The study of poetic rhythm has been the focus of a good deal of research by literary scholars (for introductions, see Gross, 1979; Fussell, 1979; Hollander, 2001). In this tradition, poetic “meter” refers to the abstract patterning scheme which governs the temporal structure of a poem, whereas “rhythm” refers to the actual patterning of durations and accents. For example, a great deal of English verse is written in iambic pentameter, a verse form consisting of 5 iambic feet, in which an iamb is a (weak + strong) syllable pattern. Naturally there are many exceptions to this pattern within iambic pentameter poetry: A particularly common one is the substitution of a trochaic foot, or (strong + weak) syllable pattern at the onset of a line. Thus the rhythm of a particular line may violate the overall meter of the poem.
文学韵律学家认为,听众内化了韵律的规律性,并将偏离该方案的行为视为与稳定背景的差异(Richards,1979:69;Adams,1997:12)。也就是说,meter 被视为具有与预期的密切关系。这个想法与作为抽象心理图式的音乐节拍的概念有关,但在一个重要方面不同于音乐节拍。音律指的是时间周期性,而诗歌韵律涉及构型周期性,换句话说,关注的是一些基本韵律单元的重复,而不是时间周期性本身。例如,在抑扬格五音步中,设计重点是抑扬格脚的弱 + 强配置,而不是重读音节的等时性。在各种形式的法语和汉语诗歌中,每行的音节数都有严格的规定,但没有注重音节的周期性(等长)。
Literary prosodists argue that listeners internalize the regularities of meter and perceive departures from this scheme as variation from a stable background (Richards, 1979:69; Adams, 1997:12). That is, meter is seen as having an intimate relationship with expectancy. This idea is related to the notion of musical meter as an abstract mental scheme, but differs from musical meter in an important way. Musical meter refers to temporal periodicity, whereas poetic meter involves configurational periodicity, in other words, the focus is on the repetition of some basic prosodic unit rather than on temporal periodicity per se. For example, in iambic pentameter it is the weak + strong configuration of the iambic foot that is the design focus, not the isochrony of stressed syllables. In various forms of French and Chinese verse, the number of syllables per line is strictly regulated, but there is no focus on making syllables periodic (equal in duration).
有趣的是,不同的语言倾向于使用不同类型的诗歌韵律。例如,英语诗歌往往倾向于纯粹基于重音的形式,其中重点是控制每行重音的数量,而与音节数量无关(例如,Beowulf 的韵律,每行四个重音)。相比之下,以调节每行音节数而不考虑重音为基础的英语诗歌很少见(Fussell,1979:62-75)。这可能反映了压力在普通英语语音节奏中的强大作用。事实上,Fussell 认为“在给定语言中惯用的计量表是惯用的,因为它‘衡量’了该语言最典型的质量”(Fussell,1974:498)。因此,重音在英语诗歌中起着主导作用,但在法语中作用不大,在法语中,每行的音节数是一个更常见的问题。反过来,日语通常会规定每行的音节数,例如俳句的 5-7-5 音节结构。
It is interesting to note that different languages tend to favor different kinds of poetic meters. For example, English verse has often tended toward purely stress-based forms in which regulation of the number of stresses per line is the focus, independent of the number of syllables (e.g., the meter of Beowulf, with four stresses per line). In contrast, English verse based on regulating the number of syllables per line without regard to stress is rare (Fussell, 1979:62-75). This likely reflects the powerful role of stress in ordinary English speech rhythm. Indeed, Fussell has argued that “a meter customary in a given language is customary just because it ‘measures’ the most characteristic quality of the language” (Fussell, 1974:498). Thus stress plays a dominant role in English poetry, but little role in French, in which the number of syllables per line is a more common concern. Japanese, in turn, often regulates the number of morae per line, as in the 5-7-5 mora structure of the haiku.
Lerdahl 和 Halle (1991) 以及 Lerdahl (2003) 试图统一诗歌和音乐中节奏的理论处理,使用诸如层次分组结构和格律网格等共享概念。然而,我们这里的重点是实证研究。在过去的几十年里,诗歌节奏引起了许多语音科学家的兴趣,他们对诗歌的时间模式进行了定量测量。例如,语音学家 Gunnar Fant 及其同事研究了瑞典语中抑扬格与长音诗行的声学特性 (1991b)。研究人员发现,在 iambic 脚中,弱音节的长度约为后面强音节的 50%,而在 trochaic 脚中,弱音节的长度约为前一个强音节的 80%(参见 Nord 等人,2016 年)。 , 1990). 这种差异可能是由于边界前拉长,它会增加每只脚中最后一个音节的持续时间(即 iamb 中的强音节和 trochee 中的弱音节)。因此 iambic 和 trochaic 脚在时间轮廓方面不仅仅是彼此的镜像:Iambic 脚在时间上更加不对称。
Lerdahl and Halle (1991) and Lerdahl (2003) have sought to unify the theoretical treatment of rhythm in poetry and music, using shared concepts such as hierarchical grouping structure and metrical grids. Our focus here, however, is on empirical research. Over the past few decades, poetic rhythm has attracted the interest of a number of speech scientists, who have made quantitative measurements of the temporal patterns of poetry. For example, the phonetician Gunnar Fant and colleagues have studied the acoustics of iambic versus trochaic lines of verse in Swedish (1991b). The researchers found that in iambic feet, the weak syllable is about 50% as long as the following strong syllable, whereas in trochaic feet, the weak syllable is about 80% of the duration of the preceding strong syllable (cf. Nord et al., 1990). This difference is likely due to preboundary lengthening, which acts to increase the duration of the final syllable in each foot (i.e., the strong syllable in an iamb and the weak syllable in a trochee). Thus iambic and trochaic feet are not simply mirror images of each other in terms of their temporal profiles: Iambic feet are much more temporally asymmetric.
这些观察可能与研究两种脚在诗句中的审美效果有关。例如,Adams (1997:55-57) 指出,trochaic 米通常与敬畏和对现实的悬置有关,因为在布莱克的诗“The Tyger”中,第一节的前三行出现了 trochaic 模式:
These observations may be relevant to the study of the aesthetic effect of the two kinds of feet in poetic lines. For example, Adams (1997:55-57) notes that trochaic meters are often associated with awe and the suspension of reality, as in Blake’s poem, “The Tyger,” in which trochaic patterns dominate the first three lines of the first stanza:
(3.16)
老虎!老虎!
在夜晚的森林中燃烧着明亮的光芒
什么样的不朽的手或眼睛
可以构成你可怕的对称?
(3.16)Tyger! Tyger! burning brightIn the forests of the nightWhat immortal hand or eyeCould frame thy fearful symmetry?
Trochaic meter 的这种美学特性可能部分是由于其更统一的音节持续时间配置文件,这与正常英语语音节奏的纹理背道而驰,从而使最终的语音具有咒语般的感觉。
This aesthetic property of trochaic meter may be partly due to its more uniform profile of syllabic durations, which goes against the grain of normal English speech rhythm and thus gives the resulting speech an incantatory feel.
另一位长期研究诗歌韵律的著名语音学家是 Ilse Lehiste (1991)。在一组研究中,Lehiste 检查了脚的时间与脚嵌入的线条之间的关系。她发现线条的时间变化低于根据脚持续时间的变化预测的,这表明说话者会在脚之间进行时间补偿,以便将线条保持在一定的持续时间内。也就是说,线条在诗歌朗诵中充当时间编程的单位(Lehiste,1990)。Ross 和 Lehiste (1998, 2001) 也研究了语言和诗歌节奏在构建爱沙尼亚诗歌和民歌的时间模式时的相互作用。
Another prominent phonetician who has long conducted research on poetic rhythm is Ilse Lehiste (1991). In one set of studies, Lehiste examined the relationship between the timing of feet and of the lines in which feet are embedded. She found that the temporal variability of lines is lower than one would predict based on the variability of feet duration, suggesting that speakers make temporal compensations between feet in order to keep lines within a certain duration. That is, lines act as a unit of temporal programming in the recitation of poetry (Lehiste, 1990). Ross and Lehiste (1998, 2001) have also examined the interplay of linguistic and poetic rhythm in framing the temporal patterns of Estonian verse and folksongs.
对于具有明确重音的语言,例如英语,每个短语或句子都有不同的强音节和弱音节模式。当这些语言中的词被设置为韵律音乐时,在音节重音模式和音乐韵律重音模式之间建立了关系。对这些关系的敏感性是用文字创作音乐的技巧的一部分,实证研究表明作曲家利用这种关系来达到艺术目的。
For languages with clearly defined stress, such as English, each phrase or sentence comes with a distinct pattern of stronger and weaker syllables. When words in these languages are set to metrical music, a relationship is established between the syllabic accent patterns and musical metrical accent patterns. Sensitivity to these relationships is part of the skill of writing music with words, and empirical research suggests that composers exploit this relationship for artistic ends.
Palmer 和 Kelly (1992) 研究了 Gilbert 和 Sullivan 的 14 首轻歌剧主题中的声线,重点关注复合名词(如单个词“blackbird”)和形容词-名词对(如两个词组“black bird”)。在英语中,复合名词的重音在第一个音节,而形容词-名词对的重音在第二个音节。他们研究了这些词是如何与音乐的韵律结构对齐的,并发现重读音节倾向于与音乐中节奏强烈的节拍对齐。考虑到吉尔伯特和沙利文歌曲的复杂文本,这种对齐策略可能会为这些轻歌剧的歌词带来精确和平衡的感觉。
Palmer and Kelly (1992) studied vocal lines in themes from Gilbert and Sullivan’s 14 operettas, focusing on compound nouns (like the single word “blackbird”) and adjective-noun pairs (like the two word phrase “black bird”). In English, compound nouns receive stress on the first syllable, whereas adjective-noun pairs receive stress on the second syllable. They studied how such words were aligned with the metrical structure of the music, and found that the stressed syllable tended to align with a metrically strong beat in the music. Given the complex texts of Gilbert and Sullivan’s songs, this strategy of alignment may contribute a sense of precision and balance to the lyrics of these operettas.
Temperley (1999) 研究了一种不同类型的声乐,即摇滚歌曲。与 Palmer 和 Kelly 不同的是,他发现在摇滚歌曲中,言语重音经常比节拍的一小部分预示着韵律重音,就像在甲壳虫乐队的“太阳来了”(图 3.13)。这种系统的预期为歌曲带来了切分感和节奏感,并提供了一个例子来说明语言和音乐重音的系统性错位如何为音乐增加动态能量。
Temperley (1999) looked at a different genre of vocal music, namely rock songs. In contrast to Palmer and Kelly, he found that verbal stress frequently anticipated metrical accent in rock songs by a fraction of a beat, as in the Beatles’ “Here Comes the Sun” (Figure 3.13). This systematic anticipation contributes a sense of syncopation and rhythmic energy to the song, and provides an example of how the systematic misalignment of verbal and musical stress adds dynamic energy to music.
语音节奏和歌曲之间的关系是一个肥沃的领域,值得进行比迄今为止接受的更多的实证研究。在本节的其余部分,我概述了这项研究可以采取的三个方向。第一个与某些类型的音乐节奏流行方面的文化差异有关。例如,Yamomoto (1996) 指出,基于三重节奏(例如,6/8 拍号)的儿童歌曲在日本很少见,但在英国很常见,并表明这可能是由于英语与日语的语音节奏不同所致。如果 Yamomoto 是正确的,那么人们会预测说日语和说英语的孩子在用这些节拍器学习歌曲的难易程度上会有所不同。另一种可以用来检验类似想法的语言是希腊语。回顾第 3.3.2 节(“质疑语音中节奏交替的原则”小节),Arvaniti(1994 年)表明,希腊语是巴尔干地区的一种语言,与英语相比,希腊语更能容忍重读音节和非重读音节之间的不规则交替。还记得第 3.2 节巴尔干地区的特色音乐具有不规则间隔的节拍。讲希腊语的孩子会比讲英语的孩子更容易学习巴尔干歌曲的不规则韵律吗(参见 Hannon & Trehub,2005 年)?在上面概述的研究中,这两组儿童当然必须匹配之前对不同音乐节拍的音乐接触。因此,最好与在家讲母语但其子女接触西方音乐的移民一起工作。在这种情况下,如果学习实验揭示了预测的文化差异,这将支持一个有趣的假设,即一种文化的语言节奏使它倾向于或远离某些音乐节奏。
The relationship between rhythm in speech and song is a fertile area that merits much more empirical investigation than it has received to date. In the remainder of this section, I outline three directions that this research could take. The first pertains to cultural differences in the prevalence of certain types of musical rhythms. For example, Yamomoto (1996) notes that children’s songs based on triple rhythms (e.g., 6/8 time signature) are rare in Japan but common in Britain, and suggests that this might be due to differences in English versus Japanese speech rhythm. If Yamomoto is correct, one would predict that Japanese- versus English-speaking children would differ in how easily they can learn songs in these meters. Another language with which one could test a similar idea is Greek. Recall from section 3.3.2 (subsection “Questioning the Principle of Rhythmic Alternation in Speech”) that Arvaniti (1994) showed that Greek, a language of the Balkan region, tolerates a more irregular alternation between stressed and unstressed syllables than does English. Also recall from section 3.2 that the Balkan region features music with irregularly spaced beats. Would Greek-speaking children find it easier to learn the irregular meters of Balkan songs than English-speaking children (cf. Hannon & Trehub, 2005)? In the studies outlined above, it would of course be essential that the two groups of children be matched for prior musical exposure to different musical meters. It may thus be best to work with immigrants who speak the native language at home but whose children are exposed to Western music. In such a case, if learning experiments reveal the predicted cultural differences, this would support the interesting hypothesis that a culture’s speech rhythm predisposes it toward or away from certain musical rhythms.
该领域的第二个研究方向是检查伴随有节奏的音乐背景的口头即兴音乐,例如当代说唱音乐。如果人声和音乐线可以记录在不同的音轨上,并且可以独立识别语言和音乐重音点,那么人们就可以研究随着时间的推移展开的语言和音乐重音点之间的时间关系。研究新手和专家说唱音乐家之间的这些关系会特别有趣,看看成为这一流派专家的一部分是否在处理两种口音对齐的方式上具有更大的灵活性和/或精确性。在这样的研究中,确定语言和音乐口音的准确时间点至关重要,
A second direction for research in this area is to examine verbally improvised music that is accompanied by a rhythmic musical context, such as contemporary rap music. If the vocal and musical lines can be recorded on different audio tracks, and points of verbal and musical stress can be independently identified, then one could study temporal relations between verbal and musical accent points as a piece unfolds in time. It would be particularly interesting to study these relations in novice versus expert rap musicians, to see if part of being an expert in this genre is greater flexibility and/or precision in the manner in which the alignment of the two types of accents is handled. In a study such as this, identifying precise points in time for verbal and musical accent will be essential, and the issue of the perceptual attack time of syllables and of musical tones (“P-centers”) comes to the fore (Morton et al., 1976; Gordon, 1987; Patel et al., 1999).
图 3.13甲壳虫乐队的“太阳来了”的一部分的音乐韵律网格。歌词在网格下方对齐,语言上强调的音节以黑体字表示:请注意大多数此类音节在音乐中如何略微位于强韵律位置之前。来自 Temperley,1999 年。
Figure 3.13 A musical metrical grid for a portion of the Beatles’ “Here Comes the Sun.” The lyrics are aligned below the grid, and linguistically stressed syllables are indicated in boldface: Note how most such syllables slightly precede strong metrical positions in the music. From Temperley, 1999.
理查德·施特劳斯 (Richard Strauss) 和罗曼·罗兰 (Romain Rolland) 在 1905 年关于音乐文本设置的通信中提出了最后一个可能的研究方向 (Myers, 1968)。25施特劳斯强调语音中的音节重音与音乐中的格律重音之间的明确关系:“在德语中,‘她’在小节的强节拍上是绝对不可能的。例如,在 4/4 的小节中,第一拍和第三拍总是有必要的重音,只能在每个单词的部首 [重音] 音节上进行重音。” 他还表达了他对法国歌剧中单词重音和音乐重音对齐的可变性的沮丧:“昨天,我又读了一些德彪西的《佩利亚斯与梅丽桑德》,我又一次对法语朗诵时演唱时的原则非常不确定. 因此在第113页, 我发现:'Cheveúx, chéveux, dé cheveux.' 看在老天的分上,我问你,这三种方法中,只有一种是正确的。”
A final possible line of research is suggested by a correspondence between Richard Strauss and Romain Rolland in 1905 about musical text setting (Myers, 1968).25 Strauss emphasizes the clear-cut relationship between syllabic accent in speech and metrical accent in music: “In German ‘she’ on the strong beat of a bar is absolutely impossible. For example, in a bar of 4/4, the first and third beat always have a necessary stress which can only be made on the radical [stressed] syllable of each word.” He also expresses his frustration over variability in the alignment of word stress and musical stress in French opera: “Yesterday, I again read some of Debussy’s Pélleas et Mélisande, and I am once more very uncertain about the principle of the declamation of French when sung. Thus on page 113, I found: ‘Cheveúx, chéveux, dé cheveux.’ For heaven’s sake, I ask you, of these three ways there can all the same only be one which is right.”
罗兰通过强调法语单词重音的可变性和微妙性来回答:
Rolland replies by emphasizing the mutability and subtlety of French word accent:
“cheveux”的自然价值是 chevéux。但是恋爱中的男人在说这个词时会特别强调:“tes chéveux”。. . . 你看,我们语言的最大困难在于,对于非常多的单词,重音是可变的——绝不是任意的,而是根据逻辑或心理原因。当你对我说:。. ” 在这3个(cheveux)中,只有一个是对的,你说的对德语来说无疑是正确的,但对法语来说则不然。
The natural value of “cheveux” is chevéux. But a man in love will, when saying this word, put quite a special stress on it: “tes chéveux.” . . . You see, the great difficulty with our language is that for a very large number of words, accentuation is variable,—never arbitrary, but in accordance with logical or psychological reasons. When you say to me: . . .” Of these 3 (cheveux) only one can be right, what you say is doubtless true of German, but not for French.
这种对应关系表明,德语文本设置在语言和音乐重音对齐方面更加严格,而法语在文本和音乐之间的重音对齐方面更为宽松。事实上,Dell 和 Halle(出版中;cf. Dell,1989)报告说,法语文本设置非常容忍口头和音乐口音之间的不匹配,但行尾除外,那里倾向于强制对齐。他们将法语歌曲中这种高度的“错配容忍度”与英语歌曲中的低得多的程度进行了对比。Strauss 和 Rolland 之间的通信以及 Dell 和 Halle 的工作表明,语言在根据节奏特性排列音乐和文本的方式上存在显着差异,尽管定量工作是需要确认这一点。这些著作还引发了跨文化感知研究的想法,这些研究测试了对歌曲中口音不匹配的敏感性。具体来说,可以向听众呈现一首歌曲的不同版本,这些版本的文本和曲调以不同的方式对齐,其中一个版本有更多的口音不匹配。然后可以要求听众判断哪个版本的歌词和音乐最相配。人们可能会预测,德国听众判断德国歌曲(或英国听众判断英语歌曲)在听起来可以接受的配对方面会比法国听众判断法国歌曲更有选择性。
What this correspondence suggests is that German text setting is more rigid in its alignment of verbal and musical accent, whereas French is more permissive in terms of accent alignment between text and music. Indeed, Dell and Halle (in press; cf. Dell, 1989) report that French text-setting is quite tolerant of mismatches between verbal and musical accent, except at the ends of lines, where alignment tends to be enforced. They contrast this high degree of “mismatch tolerance” in French songs to a much lower degree found in English songs. The correspondence between Strauss and Rolland, and the work of Dell and Halle, suggest that languages have salient differences in the way they align music and text in terms of rhythmic properties, though quantitative work is needed to confirm this. These writings also lead to ideas for cross-cultural perceptual studies testing sensitivity to accent mismatches in songs. Specifically, listeners could be presented with different versions of a song that have text and tune aligned in different ways, with one version having many more accent mismatches. Listeners could then be asked to judge in which version the words and music go best together. One might predict that German listeners judging German songs (or English listeners judging English songs) would be more selective in terms of the pairings that sound acceptable than French listeners judging French songs.
本章的一个主题是语言有节奏(系统的时间、重音和分组模式),但这种节奏不涉及重音、音节或任何其他语言单位的周期性重复。乍一看,“放弃语言的周期性”似乎意味着比较音乐和语言的节奏几乎没有依据。事实上,情况正好相反。通过放弃对周期性的固定,人们可以自由地更广泛地思考语音节奏及其与音乐节奏的关系。正如我们将在下面看到的,在结构和神经水平上比较语言和音乐方面,对语言节奏的非周期性方面的关注被证明是富有成效的。
One major theme of this chapter is that languages have rhythm (systematic temporal, accentual, and grouping patterns), but that this rhythm does not involve the periodic recurrence of stresses, syllables, or any other linguistic unit. Initially it may seem that “giving up on periodicity in speech” would mean that there is little basis for comparing rhythm in music and language. In fact, the opposite is true. By abandoning a fixation on periodicity one is freed to think more broadly about speech rhythm and its relationship to musical rhythm. As we shall see below, a focus on nonperiodic aspects of linguistic rhythm is proving fruitful in terms of comparing language and music at structural and neural levels.
一个民族的器乐反映了其语言的韵律这一观点长期以来一直吸引着音乐学者,尤其是那些对音乐中的“民族性格”感兴趣的学者。杰拉尔德·亚伯拉罕详细探讨了这一想法(1974 年,第 4 章),并以拉尔夫·柯克帕特里克对法国键盘音乐的观察为例:说法语。没有哪一种西方音乐受到语言的影响如此之大”(第 83 页)。格林卡(在Theatre Arts, 1958 年 6 月)以更简洁的方式表达了类似的观点:“一个民族创造音乐,作曲家只是安排它”(引自 Giddings,1984:91)。
The notion that a nation’s instrumental music reflects the prosody of its language has long intrigued music scholars, especially those interested in “national character” in music. Gerald Abraham explored this idea at length (1974, Ch. 4), noting as one example an observation of Ralph Kirkpatrick on French keyboard-music: “Both Couperin and Rameau, like Fauré and Debussy, are thoroughly conditioned by the nuances and inflections of spoken French. On no Western music has the influence of language been stronger” (p. 83). In a more succinct expression of a similar sentiment, Glinka (in Theater Arts, June 1958) wrote: “A nation creates music, the composer only arranges it” (cited in Giddings, 1984:91).
直到最近,支持这一想法的证据还主要是轶事。例如,Garfias (1987) 指出,在匈牙利语中,每个单词都以重读音节开头,并且匈牙利音乐旋律通常以强节拍开始(即 anacrusis 或 upbeat 很少见)。虽然这是一个有趣的观察,但这可能是因为许多这样的旋律都来自民歌。在这种情况下,语言对音乐节奏的影响将被调解通过文本。柯克帕特里克暗示的一个更有趣的问题是,语言节奏是否会影响器乐的节奏,换句话说,不是通过声音构思的音乐。
Until very recently, evidence for this idea has been largely anecdotal. For example, Garfias (1987) has noted that in Hungarian each word starts with a stressed syllable, and that Hungarian musical melodies typically start on strong beats (i.e., anacrusis, or upbeat, is rare). Although this is an interesting observation, it is possible that this is due to the fact that many such melodies come from folk songs. In this case, the linguistic influence on musical rhythm would be mediated by text. A more interesting issue, implied by Kirkpatrick, is whether linguistic rhythm influences the rhythm of instrumental music, in other words, music that is not vocally conceived.
Wenk (1987) 提出了解决这个问题的一种方法。他提议应该检查具有不同节奏语言的文化,看看音乐节奏的差异是否反映了语言节奏的差异。Wenk 专注于英语和法语,这是重音计时语言与音节计时语言的典型例子。Wenk 和 Wioland (1982) 之前曾认为两种语言之间的显着节奏差异是英语将音节分组为以重读音节开头的单元,而法语将音节分组为以重读音节结尾的单元,例如:
One approach to this question was suggested by Wenk (1987). He proposed that cultures with rhythmically distinct languages should be examined to see if differences in musical rhythm reflect differences in speech rhythm. Wenk focused on English and French, prototypical examples of a stress-timed versus a syllable-timed language. Wenk and Wioland (1982) had previously argued that a salient rhythmic difference between the two languages was that English grouped syllables into units beginning with a stressed syllable, whereas French grouped syllables into units ending with a stressed syllable, as in:
Wenk 和 Wioland 进一步争辩说,法语节奏组末尾的重音主要以持续时间延长为标志。基于这个想法,Wenk (1987) 预测乐句结尾的延长在法语器乐和英语器乐中会更常见。他通过让专业音乐家在英语和法国古典音乐中标记乐句界限来测试这个想法。然后计算两种文化中最后一个音符是短语中最长音符的短语数量。温克发现法语音乐中出现的此类短语确实多于英语音乐。
Wenk and Wioland further argued that the stress at the ends of rhythmic groups in French was marked primarily by durational lengthening. Based on this idea, Wenk (1987) predicted that phrase-final lengthening would be more common in French versus English instrumental music. He tested this idea by having a professional musician mark phrase boundaries in English versus French classical music. The number of phrases in which the final note was the longest note in the phrase was then tallied for both cultures. Wenk found that it was indeed the case that more such phrases occurred in French than in English music.
Wenk 的研究在实证方向上具有开创性,但它也有局限性,因此很难接受这些发现作为对感兴趣问题的坚定答案。每种文化只检查了一位作曲家(弗朗西斯·普朗克和本杰明·布里顿),并且从每位作曲家的作品中,只从一首作品中选择了一个乐章。此外,没有收集到语言节奏的可比经验数据(例如,英语与法语语音中短语末尾延长的程度)。
Wenk’s study was pioneering in its empirical orientation, but it also had limitations that make it difficult to accept these findings as a firm answer to the question of interest. Only one composer from each culture was examined (Francis Poulenc and Benjamin Britten), and from the oeuvre of each composer, only one movement from one piece was selected. Furthermore, no comparable empirical data for language rhythm were collected (e.g., the degree of phrase-final lengthening in English vs. French speech).
尽管有其局限性,Wenk 的研究概述了一种有用的方法,即识别两种语言之间的经验节奏差异,然后确定这些差异是否反映在两种文化的音乐中。以严格的方式追求这个想法需要三个要求。首先,需要对语音节奏进行实证测量,以量化语言之间的节奏差异。其次,同样的衡量标准应该适用于音乐,这样语言和音乐就可以在一个共同的框架内进行比较。第三,语言和音乐样本都需要足够广泛,以确保调查结果不会对少数演讲者或作曲家来说是特殊的。
Despite its limitations, Wenk’s study outlined a useful approach, namely to identify empirical rhythmic differences between two languages and then determine if these differences are reflected in the music of the two cultures. Pursuing this idea in a rigorous fashion entails three requirements. First, an empirical measure of speech rhythm is needed to quantify rhythmic differences between languages. Second, this same measure should be applicable to music so that language and music could be compared in a common framework. Third, both the linguistic and musical samples needed to be broad enough to insure that the findings are not idiosyncratic to a few speakers or composers.
Joseph Daniele 和我进行了一项旨在满足这些标准的研究(Patel 和 Daniele,2003a)。像 Wenk 一样,我们关注英式英语和法语,因为它们有明显的语音节奏,并且因为它们一直是关于韵律和器乐之间联系的强烈直觉的来源(例如,Hall,1953 年;Abraham,1974 年;Wenk 1987 年)。我们的工作受到最近关于重音时间与音节时间语音节奏的经验相关性的语音研究的启发(参见第 3.3.1 节),“持续时间和类型学”小节)。特别是,Low、Grabe 和 Nolan (2000) 的工作引起了我们的注意,因为它关注的是可以在语音和音乐中测量的东西,即序列中连续元素之间的持续对比。他们的测量方法称为归一化成对变异指数或 nPVI,已应用于重音计时语言和音节计时语言句子中的元音,并且已被证明在重音计时语言中更高,这可能是由于更大程度的这些语言中的元音减少(Grabe & Low, 2002; Ramus, 2002a; Lee & Todd, 2004; 有关 nPVI 的背景,请参阅上述 3.3.1 小节)。
Joseph Daniele and I conducted a study that set out to meet these criteria (Patel & Daniele, 2003a). Like Wenk, we focused on British English and French due to their distinct speech rhythms and because they have been the locus of strong intuitions about links between prosody and instrumental music (e.g., Hall, 1953; Abraham, 1974; Wenk 1987). Our work was inspired by recent phonetic research on empirical correlates of stress-timed versus syllable-timed speech rhythm (cf. section 3.3.1, subsection “Duration and Typology”). In particular, the work of Low, Grabe, and Nolan (2000) attracted our attention because it focused on something that could be measured in both speech and music, namely the durational contrast between successive elements in a sequence. Their measure, called the normalized pairwise variability index, or nPVI, had been applied to vowels in sentences from stress-timed and syllable-timed languages, and had been shown to be higher in stress-timed languages, likely due to the greater degree of vowel reduction in these languages (Grabe & Low, 2002; Ramus, 2002a; Lee & Todd, 2004; see the above-mentioned subsection of 3.3.1 for background on the nPVI).
该措施的两个方面使其与音乐一起使用很有吸引力。首先,nPVI a 纯粹是对比度的相对测量值。也就是说,每对间隔之间的持续时间差异是相对于这对间隔的平均持续时间来测量的。这种最初被引入以控制语速波动的归一化使 nPVI 成为可应用于语言和音乐的无量纲量。(例如,nPVI 可以根据以秒为单位的语音持续时间和以节拍的分数为单位的音乐持续时间来计算。)其次,nPVI 已应用于元音。元音构成了音节的核心,而音节又可以比作音调(即,在将单词配成音乐时,每个音符都对应一个音节是很常见的)。26因此,我们的策略是将 nPVI 应用于英国和法国器乐的音调序列,以确定是否出现了反映英国英语和法语语音之间节奏差异的差异。
Two aspects of this measure made it appealing for use with music. First, the nPVI a is purely relative measure of contrast. That is, the durational difference between each pair of intervals is measured relative to the average duration of the pair. This normalization, which was originally introduced to control for fluctuations in speech rate, makes the nPVI a dimensionless quantity that can be applied to both language and music. (For example, nPVI can be computed from speech durations measured in seconds and from musical durations measured in fractions of a beat.) Second, the nPVI has been applied to vowels. Vowels form the core of syllables, which can in turn be compared to musical tones (i.e., in setting words to music it is quite common for each note to be assigned to one syllable).26 Our strategy, then, was to apply the nPVI to tone sequences from British and French instrumental music, to determine if differences emerged that reflected the rhythmic differences between British English and French speech.
图 3.14显示了英国英语与大陆法语语音的 nPVI,基于对每种语言的母语者所说的句子中元音持续时间的测量。(这些句子是来自 Nazzi 等人,1998 年的语料库的简短、类似新闻的话语。)27
Figure 3.14 shows the nPVI to British English versus continental French speech, based on measurements of vowel durations in sentences uttered by native speakers of each language. (The sentences are short, news-like utterances from the corpus of Nazzi et al., 1998.)27
图 3.14英国英语和法语句子的 nPVI。误差线显示 +/– 1 个标准误差。数据来自 Patel、Iversen 和 Rosenberg,2006 年。
Figure 3.14 The nPVI of British English and French sentences. Error bars show +/– 1 standard error. Data from Patel, Iversen, & Rosenberg, 2006.
英语的 nPVI 明显高于法语。图 3.15通过说明该语料库中一个英语和法语句子的元音持续时间模式,给出了为什么会出现这种情况的直觉(参见声音示例 3.8a,b)。
The nPVI is significantly higher for English than for French speech. Figure 3.15 gives an intuition for why this is the case by illustrating the pattern of vowel duration for one English and French sentence in this corpus (cf. Sound Examples 3.8a, b).
例如,在顶部面板中,前两个值(大约 120 毫秒和 40 毫秒)是句子中前两个元音(即“Finding”中的元音)的持续时间,依此类推。请注意,英语句子中的连续元音与法语句子中的连续元音在持续时间上的差异更大。在英语句子中,有些元音很短(通常是由于元音减少),而另一些元音则很长(通常是由于重音)。这导致相邻元音之间持续对比的趋势更大,这反映在 nPVI 分数中。
For example, in the top panel, the first two values (about 120 ms and 40 ms) are the durations of the first two vowels in the sentence (i.e., the vowels in “Finding”), and so on. Note how successive vowels tend to differ more in duration for the English sentence than for the French sentence. In the English sentence, some vowels are very short (often due to vowel reduction), whereas other vowels are quite long (often due to stress). This leads to a greater tendency for durational contrast between neighboring vowels, which is reflected in the nPVI score.
如上所述,nPVI 的一个吸引人的方面是它可以应用于音乐以测量连续音符之间的持续时间对比。西方音乐符号以明确的方式指示音符的相对持续时间,如图 3.16所示。
As mentioned above, an appealing aspect of the nPVI is that it can be applied to music in order to measure the durational contrast between successive notes. Western music notation indicates the relative duration of notes in an unambiguous fashion, as shown in Figure 3.16.
在图中,每个主题的第一个音符被任意指定为 1 的持续时间,其余音符的持续时间表示为该值的倍数或分数。(任何保留音符相对持续时间的数字编码方案都会产生相同的 nPVI,因为它是一种标准化度量。)在这个例子中,德彪西主题的 nPVI 低于埃尔加主题,即使德彪西主题中的音符时长大于埃尔加主题中的音符时长(用变异系数衡量,换句话说,标准差除以平均值)。这强调了一个事实,即 nPVI 索引的是序列中连续元素之间的对比程度,而不是这些元素的整体可变性。
In the figure, the first note of each theme is arbitrarily assigned a duration of 1, and the durations of the remaining notes are expressed as a multiple or fraction of this value. (Any numerical coding scheme that preserves relative duration of notes would yield the same nPVI, because it is a normalized measure.) In this example, the nPVI of the Debussy theme is lower than that of the Elgar theme, even though the raw variability of note duration in the Debussy theme is greater than that in the Elgar theme (as measured by the coefficient of variation, in other words, the standard deviation divided by the mean). This emphasizes the fact that the nPVI indexes the degree of contrast between successive elements in a sequence, not the overall variability of those elements.
图 3.15英语和法语句子中的元音持续时间。请注意英语句子中相邻元音时长之间更大程度的短-长对比。英语句子的 nPVI 是 54.9,法语句子是 30.0。
Figure 3.15 Vowel durations in an English and a French sentence. Note the greater degree of short-long contrast in the English sentence between adjacent vowel durations. The nPVI for the English sentence is 54.9, and for the French sentence is 30.0.
我们的音乐材料来源是音乐学的标准参考书,《音乐主题词典》,第二版(Barlow & Morgenstern,1983 年),重点介绍西欧作曲家的器乐。在选择作曲家纳入我们的研究时,我们受到两个因素的指导。首先,作曲家必须来自一个相对较新的音乐时代,因为语音韵律的测量是基于当代语音,而众所周知,语言的声音结构会随着时间的推移而变化。其次,作曲家必须是在英国或法国生活和工作过的以英式英语或法语为母语的人。使用这些指南,我们检查了来自 Barlow 和 Morgenstern 的所有英语和法语作曲家,他们出生于 1800 年代,死于 1900 年代,并且字典中至少有五个音乐主题有资格被纳入研究(参见 Patel & Daniele,2003a,对于纳入标准,以及使用乐谱而不是录制的音乐进行 nPVI 分析的基本原理)。我们选择了跨越世纪之交的作曲家,因为这个时代被音乐学家称为欧洲“音乐民族主义”时代。
Our source of musical material was a standard reference work in musicology, A Dictionary of Musical Themes, Second Edition (Barlow & Morgenstern, 1983), which focuses on the instrumental music of Western European composers. In choosing composers to include in our study we were guided by two factors. First, the composers had to be from a relatively recent musical era because measurements of speech prosody are based on contemporary speech, and languages are known to change over time in terms of sound structure. Second, the composers had to be native speakers of British English or French who lived and worked in England or France. Using these guidelines, we examined all English and French composers from Barlow and Morgenstern who were born in the 1800s and died in the 1900s, and who had at least five musical themes in the dictionary that were eligible for inclusion in the study (see Patel & Daniele, 2003a, for inclusion criteria, and for the rationale of using music notation rather than recorded music for nPVI analysis). We chose composers who spanned the turn of the century because this era is noted by musicologists as a time of “musical nationalism” in Europe.
图 3.16标有每个音符的相对持续时间的两个音乐主题。D122 的 nPVI = 42.2,E72 = 57.1。主题来自 Barlow & Morgenstern, 1983。来自 Patel & Daniele, 2003a。
Figure 3.16 Two musical themes with the relative durations of each note marked. nPVI of D122 = 42.2, of E72 = 57.1. Themes are from Barlow & Morgenstern, 1983. From Patel & Daniele, 2003a.
根据我们的标准,16 位作曲家被纳入研究,包括英国作曲家埃尔加、戴留斯和沃恩威廉姆斯,以及法国作曲家德彪西、普朗克和圣桑。代表了大约 300 个音乐主题,并为每个主题计算了一个音乐 nPVI 值。我们对音乐 nPVI 的分析结果以及语音 nPVI 值如图 3.17所示。值得注意的是,这两种文化具有显着不同的音乐 nPVI 值,差异与语言 nPVI 差异的方向相同(有关详细信息,请参见 Patel 和 Daniele,2003a 和 Patel 等人,2006 年)。
Based on our criteria, 16 composers were included in the study, including English composers such as Elgar, Delius, and Vaughan Williams, and French composers such as Debussy, Poulenc, and Saint-Saens. About 300 musical themes were represented, and one musical nPVI value was computed for each theme. The results of our analysis of musical nPVI are shown in Figure 3.17, along with the speech nPVI values. Remarkably, the two cultures have significantly different musical nPVI values, with the difference being in the same direction as the linguistic nPVI difference (see Patel & Daniele, 2003a, and Patel et al., 2006, for further details).
因此,有经验证据表明,语音节奏反映在音乐节奏中,至少在世纪之交的英法古典音乐中是这样。语言和音乐之间的这种联系是如何中介的?一些音乐学家提出,民族特色源于作曲家将民间旋律改编成他们的作品。由于此类旋律通常来自歌曲,因此可能是单词的节奏影响了这些旋律的节奏,从而使旋律具有类似语言的节奏模式。然而,我们认为这可能不是对我们发现的最好解释,因为我们的研究包括许多被认为不受民间音乐强烈影响的作曲家,例如埃尔加和德彪西 (Grout & Palisca, 2000)。相反,我们觉得从语言到音乐可能有更直接的途径。从语言习得的研究中得知,感知系统从很小的时候就对语言的节奏模式很敏感(Nazzi 等人,1998 年;Ramus,2002b)。作曲家和他们文化中的其他成员一样,将这些模式内化为学习说他们母语的一部分。(这种内化的一种机制是称为统计学习的过程,下一章将对此进行更详细的讨论。)我们建议作曲家在创作音乐时,语言节奏“在他们的耳朵里”,他们可以有意识或无意识地借鉴这些模式编织了他们音乐的声音结构。这并不意味着语言和音乐节奏之间的联系是强制性的。相反,在作曲家为他们的音乐寻求民族特色的历史时代,这种联系可能会更大。
Thus there is empirical evidence that speech rhythm is reflected in musical rhythm, at least in turn-of-the century classical music from England and France. How is this connection between language and music mediated? Some musicologists have proposed that national character arises from composers adapting folk melodies into their compositions. Because such melodies are typically from songs, it may be that the rhythm of words influences the rhythm of these melodies, thus giving the melodies a language-like rhythmic pattern. However, we believe that this may not be the best explanation for our finding, because our study included numerous composers who are not thought to be strongly influenced by folk music, such as Elgar and Debussy (Grout & Palisca, 2000). Instead, we feel there may be a more direct route from language to music. It is known from studies of language acquisition that the perceptual system is sensitive to the rhythmic patterns of language from a very early age (Nazzi et al., 1998; Ramus, 2002b). Composers, like other members of their culture, internalize these patterns as part of learning to speak their native language. (One mechanism for this internalization is a process called statistical learning, which is discussed in more detail in the next chapter.) We suggest that when composers write music, linguistic rhythms are “in their ears,” and they can consciously or unconsciously draw on these patterns in weaving the sonic fabric of their music. This does not imply that the connection between linguistic and musical rhythm is obligatory. Rather, this link is likely to be greater in historical epochs where composers seek a national character for their music.
图 3.17英国英语和法语音乐主题的 nPVI。误差线显示 +/– 1 个标准误差。数据来自 Patel 和 Daniele,2003a,以及 Patel、Iversen 和 Rosenberg,2006。
Figure 3.17 The nPVI of British English and French musical themes. Error bars show +/– 1 standard error. Data from Patel & Daniele, 2003a, and Patel, Iversen, & Rosenberg, 2006.
我们对英语和法语语音和音乐的发现立即提出了两个问题。如果研究更广泛的英语和法语主题和作曲家样本,是否会观察到音乐 nPVI 差异?或许更重要的是,我们的结果是否可以推广到使用重音与音节计时语言的其他文化?幸运的是,Huron 和 Ollen (2003) 为这些问题提供了答案。使用休伦创建的电子版音乐主题词典,他们计算了更大样本的英语和法语音乐主题的 nPVI(大约 2000 个主题,创作于 1500 年代中期和 1900 年代中期)。他们证实,英语音乐的 nPVI 明显高于法国音乐,尽管差异小于 Patel 和 Daniele 发现的差异(可能是由于抽样标准不那么严格)。他们还计算了一系列其他国家的音乐 nPVI,分析了 3 个多世纪以来来自 12 个国家的近 8,000 个主题。在他们检查的国籍中,有五个可以指定为重音计时语言,三个可以指定为音节计时语言(Fant et al., 1991a; Grabe & Low, 2002; Ramus, 2002b)。这些在表 3.1中列出以及他们的音乐 nPVI 值。(表 3.1 中的数据代表 Huron & Ollen,2003 年原始表格的修正值,由 David Huron 友情提供。有关更多文化的数据,请参阅本章的附录 2。 )
Our findings for English and French speech and music immediately raised two questions. Would the musical nPVI difference be observed if a broader sample of English and French themes and composers were studied? Perhaps more importantly, would our result generalize to other cultures in which stress- versus syllable-timed languages are spoken? Fortunately, Huron and Ollen (2003) provided answers to these questions. Using an electronic version of A Dictionary of Musical Themes created by Huron, they computed the nPVI of a much larger sample of English and French musical themes (about 2000 themes, composed between the mid-1500s and mid-1900s). They confirmed that the nPVI of English music was significantly higher than that of French music, though the difference was smaller than that found by Patel and Daniele (likely due to less stringent sampling criteria). They also computed the musical nPVI for a range of other nations, analyzing almost 8,000 themes from 12 nationalities over more than 3 centuries. Of the nationalities they examined, five can be assigned to stress-timed languages and three to syllable timed languages (Fant et al., 1991a; Grabe & Low, 2002; Ramus, 2002b). These are listed in Table 3.1 along with their musical nPVI values. (The data in table 3.1 represent corrected values of the original table in Huron & Ollen, 2003, kindly provided by David Huron. See this chapter’s appendix 2 for data from more cultures.)
在使用重音计时语言(美国、奥地利、英语和瑞典语)的五个国家中,有四个确实比三个使用音节计时语言的国家具有更高的音乐 nPVI 值,这为重音计时和音节-定时语言与独特的音乐节奏相关联。然而,德国音乐是一个明显的例外:它的音乐 nPVI 值很低,尽管德语是一种重音时间语言,语音 nPVI 值很高 (Grabe & Low, 2002; Dellwo, 2004)。
Four out of the five nations with stress-timed languages (American, Austrian, English, and Swedish) do indeed have higher musical nPVI values than the three nations with syllable-timed languages, providing support for the idea that stress-timed and syllable-timed languages are associated with distinctive musical rhythms. However, German music is a notable exception: It has a low musical nPVI value despite the fact that German is a stress-timed language with a high nPVI value for speech (Grabe & Low, 2002; Dellwo, 2004).
然而,德国音乐的 nPVI 较低可能有一个历史原因,即众所周知的意大利音乐对德国音乐的影响(Kmetz 等人,2001 年)。由于意大利音乐的 nPVI 较低,因此对这种音乐的风格模仿可能会超过德语对德国音乐 nPVI 的任何语言影响。检验这个想法的一种方法是从历史角度检查 nPVI,例如,作为每个作曲家出生年份的函数。当以这种方式检查 14 位德国作曲家的主题时,出现了一个显着的趋势,如图 3.18所示(Patel 和 Daniele,2003b;Daniele 和 Patel,2004)。
However, there may be a historical reason why German music has a low nPVI, namely the well-known influence of Italian music on German music (Kmetz et al., 2001). Because Italian music has a low nPVI, stylistic imitation of this music might outweigh any linguistic influence of the German language on the nPVI of German music. One way to test this idea is to examine the nPVI in historical perspective, for example, as a function of each composer’s birth year. When themes from 14 German composers were examined in this fashion a striking trend emerged, as shown in Figure 3.18 (Patel & Daniele, 2003b; Daniele & Patel, 2004).
T$able 3.1 Musical nPVI Values for Eight Different Nationalities
在 250 年的时间里,nPVI 几乎翻了一番,这一趋势具有高度统计意义。(有趣的是,对于我们研究中的六位奥地利作曲家来说,这种趋势也很明显。)鉴于对德语历史的了解,这不太可能反映德语节奏从音节计时到重读的变化-在此期间计时(C. Heeschen,个人通讯)。相反,它很可能反映了音乐风格的历史变化,也许包括这一时期意大利音乐对德国音乐的影响减弱。事实上,这一发现与意大利音乐在巴洛克时期(1600-1750 年)对德国音乐的影响很大,在古典时期(1750-1825 年)影响较小,在浪漫主义时期影响最小的观点是一致的时代(1825-1900)。更普遍,28
Over the course of 250 years, nPVI almost doubled, a trend that is highly statistically significant. (Interestingly, this trend is also evident for the six Austrian composers we included in our study.) Given what is known about the history of the German language, this is unlikely to reflect a change in the rhythm of German from syllable-timed to stress-timed during this period (C. Heeschen, personal communication). Instead, it most likely reflects historical changes in musical style, perhaps including a waning influence of Italian music on German music over this period. In fact, the finding would be consistent with the idea that Italian music had a strong influence on German music during the Baroque era (1600-1750), less influence during the Classical era (1750-1825), and the least influence during the Romantic era (1825-1900). More generally, it suggests that in studying linguistic influences on speech rhythm, it is important to keep in mind historical influences that can run counter to linguistic influences.28
图 3.18 20 位作曲家的 nPVI 作为作曲家出生年份的函数。(实心点 = 德国作曲家;空心点 = 奥地利作曲家。)显示了最合适的线性回归线。来自 Patel 和 Daniele,2003b。
Figure 3.18 nPVI as a function of composer birth year for 20 composers. (Solid dots = German composers; open dots = Austrian composers.) The best-fitting linear regression line is shown. From Patel & Daniele, 2003b.
退后一步,音乐 nPVI 研究表明,无需借助任何周期性概念,就可以对语音和音乐的节奏结构进行富有成效的比较。值得注意的是,迄今为止所做的研究几乎没有穷尽使用该措施可以做的事情。例如,可以将 nPVI 应用于演奏音乐的录音而不是音乐符号。人们还可以检查说重音时间语言和音节时间语言的音乐家表演同一段器乐的 nPVI,看看母语是否影响音乐表演的时间模式(参见 Ohgushi,2002 年)。最后,人们可以研究即兴音乐,例如,通过研究说不同方言和不同节奏品质的爵士音乐家(例如,在美国,也许是东北方言与南方方言)。在这种情况下,nPVI 可用于调查语音的时间模式是否反映在即兴音乐的节奏中。
Taking a step back, musical nPVI research demonstrates that the rhythmic structure of speech and music can be fruitfully compared without any resort to notions of periodicity. It is worth noting that the research done so far hardly exhausts what can be done using this measure. For example, one could apply the nPVI to recordings of performed music rather than to music notation. One could also examine the nPVI of performances of the same piece of instrumental music by musicians who speak stress-timed versus syllable-timed languages, to see if the native language influences temporal patterns in music performance (cf. Ohgushi, 2002). Finally, one could study improvised music, for example, by studying jazz musicians who speak different dialects with different rhythmic qualities (e.g., in the United States, perhaps a northeast dialect versus a southern dialect). In this case, the nPVI could be used to investigate whether the temporal pattern of speech is reflected in the rhythm of improvised music.
语言学家和音乐研究人员都阐明了非语言节奏感知会受到母语影响的观点。50 多年前,Jakobson、Fant 和 Halle (1952:10-11) 提出了以下主张:
The idea that nonlinguistic rhythm perception can be influenced by one’s native language has been articulated by both linguists and music researchers. Over 50 years ago, Jakobson, Fant, and Halle (1952:10-11) made the following claim:
语言模式的干扰甚至会影响我们对非语音声音的反应。以均匀间隔发出的敲门声,每三分之一响亮一次,被认为是三个一组,中间有一个停顿。捷克人通常声称停顿在更响亮的敲门声之前停止,而法国人则声称停顿在更响亮的敲门声之后停止;而一个波兰人听到敲门声后的停顿。不同的感知完全对应于所涉及语言中单词重音的位置:在捷克语中,重音在首音节上,在法语中,在最后一个音节上,在波兰语中,重音在倒数第二个音节上。
Interference by the language pattern affects even our responses to nonspeech sounds. Knocks produced at even intervals, with every third louder, are perceived as groups of three separated by a pause. The pause is usually claimed by a Czech to fall before the louder knock, by a Frenchman to fall after the louder; while a Pole hears the pause one knock after the louder. The different perceptions correspond exactly to the position of word stress in the languages involved: in Czech the stress is on the initial syllable, in French, on the final and in Polish, on the penult.
Jakobson 等人建议的分组。可以示意性地表示如下,其中每个 x 代表敲击声,大写 X 的声音更大:
The groupings suggested by Jakobson et al. can be schematically represented as follows, in which each x represent a knock and the upper case X’s are louder:
雅各布森等人的主张。这当然是挑衅,但一直没有经验证据支持。尽管如此,母语和非语言节奏感知之间存在联系的观点仍然存在。例如,Stobart 和 Cross (2000) 记录了玻利维亚高地 Viacha 人的一种音乐形式,其中当地标记节拍的方式与大多数说英语的听众所感知的不同。声音示例 3.9 用小吉他 ( charango). 在摘录的结尾可以听到 Viacha 拍手或用脚敲击节拍的位置。这种将每组两个音符中较短的事件标记为节拍的趋势与说英语的人以抑扬格方式听到该模式的趋势相反,即每对音符的第二个事件上的节拍。Stobart 和 Cross 推测,以 trochaically 标记节拍的倾向(在每个组的第一个成员身上)与当地语言 Quechua 的重音模式有关。
The claim of Jakobson et al. is certainly provocative, but there has been no empirical evidence to support it. Nevertheless, the idea of a link between native language and nonlinguistic rhythm perception persists. For example, Stobart and Cross (2000) have documented a form of music from the Viacha people of the Bolivian highlands in which the local manner of marking the beat is different from what most English-speaking listeners perceive. Sound Example 3.9 illustrates this music with an Easter song played on a small guitar (charango). The position at which the Viacha clap or tap their foot to the beat can be heard at the end of the excerpt. This tendency to mark the shorter event in each group of two notes as the beat is contrary to the tendency of English speakers to hear the pattern iambically, that is, with the beat on the second event of each pair. Stobart and Cross speculate that the tendency to mark the beat trochaically (on the first member of each group) is related to stress patterns in words of the local language, Quechua.
上面两个例子的区别在于前者关注分割(节奏分组),而后者关注节拍感知。但是请注意,两者都没有提到语音中的周期性概念。相反,它们都指的是词汇重音的模式以及它如何影响非语言听觉感知。因此,我们再次看到,可以在不参考语音周期性的情况下做出关于音乐和语言之间节奏关系的有趣声明。
The two examples above differ in that the former concerns segmentation (rhythmic grouping), whereas the latter concerns beat perception. Note, however, that neither refers to notions of periodicity in speech. Instead, they both refer to patterns of lexical stress and how this influences nonlinguistic auditory perception. Thus once again we see that interesting claims about rhythmic relations between music and language can be made without any reference to periodicity in speech.
如何评估母语是否影响非语言节奏的感知?作为第一步,有必要证明非语言节奏感知存在文化差异。正如 Jakobson 等人所暗示的那样,在这方面,节奏分割或分组特别令人感兴趣。(1952 年)。这是因为心理语言学研究表明,一个人的母语节奏会导致即使在听外语时也会应用的分段策略。因此,母语可以影响非语言节奏分割的想法与这个想法只有一步之遥(参见第 3.3.3 节,“节奏在分割连接的语音中的作用”小节)。
How can one assess whether the native language influences the perception of nonlinguistic rhythm? As a first step, it is necessary to demonstrate that there are cultural differences in nonlinguistic rhythm perception. Rhythmic segmentation or grouping is of particular interest in this regard, as intimated by Jakobson et al. (1952). This is because psycholinguistic research indicates that that the rhythm of one’s native language leads to segmentation strategies that are applied even when listening to a foreign language. The idea that the native language can influence nonlinguistic rhythmic segmentation is thus just one step away from this idea (cf. section 3.3.3, subsection “The Role of Rhythm in Segmenting Connected Speech”).
然而目前,人们普遍认为基本分组操作反映了不受文化影响的一般听觉偏差。这种信念源于一个具有百年历史的研究系列,在该系列研究中,研究人员使用简单的音调序列研究了节奏分组(Bolton,1894 年;Wood-row,1909 年)。例如,向听众呈现响度交替变化的音调(……大声-柔和-响亮-柔和……)或持续时间(……长-短-长-短……),并要求听众指出他们感知分组。一个世纪前建立的两个原则被广泛接受,并在许多研究中得到证实:
Yet at the current time, it is widely believed that elementary grouping operations reflect general auditory biases not influenced by culture. This belief stems from a century-old line of research in which researchers have investigated rhythmic grouping using simple tone sequences (Bolton, 1894; Wood-row, 1909). For example, listeners are presented with tones that alternate in loudness (. . . loud-soft-loud-soft . . .) or duration (. . . long-short-long-short . . .) and are asked to indicate their perceived grouping. Two principles established a century ago, and confirmed in numerous studies since, are widely accepted:
1. 较大的声音往往标志着一个组的开始。
2. 延长的声音往往标志着一组的结束。
1. A louder sound tends to mark the beginning of a group.2. A lengthened sound tends to mark the end of a group.
这些原则已被视为普遍的感知法则,是语言和音乐节奏的基础(Hayes,1995b;Hay & Diehl,2007)。然而,跨文化数据来自有限范围的文化(美国、荷兰和法国)。这些原则真的具有普遍性吗?Kusumoto 和 Moreton (1997) 的一项研究表明,美国和日本的听众在上述原则 2 方面存在差异。这项研究激发了 Iversen、Patel 和 Ohgushi(2008 年)对这项工作的复制和扩展,如下所述。
These principles have come to be viewed as universal laws of perception, underlying the rhythms of both speech and music (Hayes, 1995b; Hay & Diehl, 2007). However, the cross-cultural data have come from a limited range of cultures (American, Dutch, and French). Are the principles truly universal? A study by Kusumoto and Moreton (1997) suggested otherwise, finding that American versus Japanese listeners differed with regard to Principle 2 above. This study motivated a replication and extension of this work by Iversen, Patel, and Ohgushi (2008), described below.
艾弗森等人。让以日语为母语的人和以美国英语为母语的人听音调序列。音调在响度(“振幅”序列,声音示例 3.10a)或持续时间(“持续时间”序列,声音示例 3.10b)中交替,如图 3.19所示。
Iversen et al. had native speakers of Japanese and native speakers of American English listen to sequences of tones. The tones alternated in loudness (“amplitude” sequences, Sound Example 3.10a) or in duration (“duration” sequences, Sound Example 3.10b), as shown schematically in Figure 3.19.
听众告诉实验者他们如何看待分组。结果表明,说日语和说英语的人都同意原则 1):他们都报告说他们听到了重复的大声和小声组。然而,当谈到原则 2 时,听众表现出明显的不同。)虽然说英语的人认为“普遍的”短-长组合,但许多日本听众强烈认为相反的模式,换句话说,重复的长-短组合。(参见图 3.19)。因为这一发现令人惊讶并且与感知“法则”相矛盾,Iversen 等人。与来自日本不同地区的听众一起复制它。这一发现是可靠的,需要一个解释。为什么以英语为母语的人和以日语为母语的人会如此不同?
Listeners told the experimenters how they perceived the grouping. The results revealed that Japanese and English speakers agreed with principle 1): both reported that they heard repeating loud-soft groups. However, the listeners showed a sharp difference when it came to principle 2.) Although English speakers perceived the “universal” short-long grouping, many Japanese listeners strongly perceived the opposite pattern, in other words, repeating long-short groups. (cf. Figure 3.19). Because this finding was surprising and contradicted a “law” of perception, Iversen et al. replicated it with listeners from different parts of Japan. The finding is robust and calls for an explanation. Why would native English and Japanese speakers differ in this way?
图 3.19 左侧:感知实验中使用的声音序列示意图。这些序列由响度交替的音调(“振幅序列”,顶部)或持续时间(“持续时间序列”,底部)组成。在振幅序列中,细条对应于较轻的声音,粗条对应于较大的声音。在持续时间序列中,短条对应较短的声音,长条对应较长的声音。序列前后的点表示仅显示了较长的交替音调序列的摘录。右边:美国和日本听众感知到的节奏分组,用椭圆表示。实心黑色椭圆表示遵循“普遍”感知原则的偏好,而虚线黑色椭圆表示违反所谓普遍性的偏好。
Figure 3.19 Left side: Schematic of sound sequences used in the perception experiment. These sequences consist of tones alternating in loudness (“amplitude sequence,” top), or duration (“duration sequence,” bottom). In the amplitude sequence, thin bars correspond to softer sounds and thick bars correspond to louder sounds. In the duration sequence, short bars correspond to briefer sounds and long bars correspond to longer sounds. The dots before and after the sequences indicate that only an excerpt of a longer sequence of alternating tones is shown. Right side: Perceived rhythmic grouping by American and Japanese listeners, indicated by ovals. Solid black ovals indicate preferences that follow “universal” principles of perception, while the dashed black oval indicates a preference that violates the purported universals.
假设这些不同的感知偏差不是天生的,他们的关键问题是听觉体验的哪个方面可能是造成这种差异的原因。两个明显的候选者是音乐和语音,因为这些声音模式贯穿于人类的一生。这两种模式都向耳朵呈现声音序列,这些声音序列必须被分解成更小的连贯块,例如音乐中的短语,或语音中的短语和单词。在两种文化中,这些块的时间节奏是否因音乐或语言而异?也就是说,短-长模式在美国音乐或演讲中是否更常见,而长-短模式在日本音乐或演讲中更常见?如果是这样,那么学习这些模式可能会普遍影响听觉分割,并解释我们观察到的差异。
Assuming that that these different perceptual biases are not innate, they key question is what aspect of auditory experience might be responsible for this difference. Two obvious candidates are music and speech, because these sound patterns surround humans throughout their life. Both patterns present the ear with sequences of sound that must be broken into smaller coherent chunks, such as phrases in music, or phrases and words in speech. Might the temporal rhythm of these chunks differ for music or speech in the two cultures? That is, might short-long patterns be more common in American music or speech, and long-short be more common in Japanese music or speech? If so, then learning these patterns might influence auditory segmentation generally, and explain the differences we observe.
首先关注音乐,一个相关问题涉及音乐短语在两种文化中如何开始的节奏。例如,如果美国音乐中的大多数短语以短-长模式开头(例如,“pick-up note”),而日本音乐中的大多数短语以长-短模式开头,那么听众可能会学会使用这些模式作为分割线索。为了验证这个想法,我们检查了美国和日本儿童歌曲中的短语(因为我们相信这些感知偏见可能是在生命早期形成的)。我们检查了每种文化的 50 首歌曲,对于每个短语,我们计算了第一个音符与第二个音符的持续时间比率,然后计算了短语以短-长模式与其他可能模式(例如,长-短或等长)开始的频率). 我们发现美国歌曲没有表现出以短-长模式开始短语的倾向。有趣的是,日本歌曲表现出以长短模式开始短语的偏见,这与我们的感知发现一致。然而,单靠音乐数据并不能解释我们观察到的文化差异,因为这个数据无法解释美国听众的短长分组偏见。
Focusing first on music, one relevant issue concerns the rhythm of how musical phrases begin in the two cultures. For example, if most phrases in American music start with a short-long pattern (e.g., a “pick-up note”), and most phrases in Japanese music start with a long-short pattern, then listeners might learn to use these patterns as segmentation cues. To test this idea, we examined phrases in American and Japanese children’s songs (because we believe these perceptual biases are probably laid down early in life). We examined 50 songs per culture, and for each phrase we computed the duration ratio of the first to the second note and then counted how often phrases started with a short-long pattern versus other possible patterns (e.g., long-short, or equal duration). We found that American songs show no bias to start phrases with a short-long pattern. Interestingly, Japanese songs show a bias to start phrases with a long-short pattern, consistent with our perceptual findings. However, the musical data alone cannot explain the cultural differences we observe, because this data cannot explain the short-long grouping bias of American listeners.
谈到语言,英语和日语的一个基本区别在于词序 (Baker, 2001)。例如,在英语中,短语法(或“功能”)词,如“the”、“a”、“to”等,出现在短语的开头,并与较长的有意义的(或“内容”)结合单词(例如名词或动词)。功能词通常是“减少的”,持续时间短,重音低。这会创建以短元素开头并以长元素结尾的频繁语言块,例如“狗”、“吃”、“一张大桌子”等。关于英语的这一事实长期以来一直被诗人用来创造英语最常见的诗歌形式——抑扬格五音步。
Turning to language, one basic difference between English and Japanese concerns word order (Baker, 2001). For example, in English, short grammatical (or “function”) words such as “the,” “a,” “to,” and so forth, come at the beginning of phrases and combine with longer meaningful (or “content”) words (such as a noun or verb). Function words are typically “reduced,” having short duration and low stress. This creates frequent linguistic chunks that start with a short element and end with a long one, such as “the dog,” “to eat,” “a big desk,” and so forth. This fact about English has long been exploited by poets in creating the English language’s most common verse form, iambic pentameter.
相比之下,日语将功能词放在短语的末尾。日语中常见的虚词包括“格标记”,可以指示名词是主语、直接宾语、间接宾语等的短音。例如,在句子“John-san-ga Mari-san-ni hon-wo age-mashita”(“John gave a book to Mari”)中,后缀“ga”、“ni”和“wo”是大小写标记表明 John 是主语,Mari 是间接宾语,“hon”(书)是直接宾语。将功能词放在短语的末尾会创建以长元素开头并以短元素结尾的频繁块,这与英语中短语的节奏正好相反(cf. Morgan et al., 1987)。
Japanese, in contrast, places function words at the ends of phrases. Common function words in Japanese include “case markers,” short sounds that can indicate whether a noun is a subject, direct object, indirect object, and so forth. For example, in the sentence “John-san-ga Mari-san-ni hon-wo age-mashita,” (“John gave a book to Mari”) the suffixes “ga,” “ni,” and “wo” are case markers indicating that John is the subject, Mari is the indirect object and “hon” (book) is the direct object. Placing function words at the ends of phrases creates frequent chunks that start with a long element and end with a short one, which is just the opposite of the rhythm of short phrases in English (cf. Morgan et al., 1987).
除了短语之外,语言中其他有意义的短块是单词。因为我们的感知实验侧重于双元素组,所以我们检查了英语和日语中常见双音节词的时间形状。英语双音节词倾向于在第一个音节上重读(例如,MO-ney, MAY-be; Cutler & Carter, 1987),这可能让人认为它们的音节时长有长-短节奏模式。为了测试这一点,我们检查了语言中 50 个最常见的双音节词的音节持续时间模式(来自自发语音语料库),并测量了两个音节的相对持续时间。令人惊讶的是,第一个音节重音的常用词并没有强烈偏向长-短持续时间模式。相反,重音在第二个音节上的常用词,例如“a-BOUT”、“be-CAUSE”和“be-FORE”具有非常强烈的短-长持续时间模式。因此,英语中常见的双音节词的平均持续时间模式是短-长(图 3.20)。
Apart from short phrases, the other short meaningful chunks in language are words. Because our perception experiment focused on two-element groups, we examined the temporal shape of common disyllabic words in English and Japanese. English disyllabic words tend to be stressed on the first syllable (e.g., MO-ney, MAY-be; Cutler & Carter, 1987), which might lead one to think that they would have a long-short rhythmic pattern of syllable duration. To test this, we examined syllable duration patterns for the 50 most common disyllabic words in the language (from a corpus of spontaneous speech), and measured the relative duration of the two syllables. Surprisingly, common words with stress on the first syllable did not have a strong bias toward a long-short duration pattern. In contrast, common words with stress on the second syllable, such as “a-BOUT,” “be-CAUSE,” and “be-FORE,” had a very strong short-long duration pattern. Thus the average duration pattern for common two-syllable words in English was short-long (Figure 3.20).
图 3.20美式英语自发语音中常见双音节词的音节时长比分布。(a) 和 (b) 中显示了初始重音词与最终重音词的单独直方图,(c) 中显示了组合数据,按词频加权。箭头指示的平均值。(c) 中的整体分布具有显着的短-长偏差(平均比率 = 1 : 1.11)。
Figure 3.20 Distribution of syllable duration ratios for common two-syllable words in spontaneous speech in American English. Separate histograms are shown for initial-stress versus final-stress words in (a) and (b), and combined data are shown in (c), weighted by word frequency. Averages indicated by arrowheads. The overall distribution in (c) has a significant short-long bias (average ratio = 1 : 1.11).
这意味着短-长节奏模式在英语小短语和常见双音节词层面都有体现。我们还检查了日语中 50 个最常见的双音节词的音节持续时间模式。与英语相比,这些词的平均持续时间模式是长短。因此,语言节奏再次反映了感知实验的结果。
This means that a short-long rhythm pattern is reflected at both the level of small phrases and common disyllabic words in English. We also examined syllable duration patterns in the 50 most common disyllabic words in Japanese. In contrast to English, the average duration pattern for such words was long-short. Thus once again, linguistic rhythm mirrored the results of the perception experiment.
退后一步,我们的结果表明,长期以来被认为遵循普遍原则的节奏分组的感知实际上因文化而异。我们对这种差异的解释是基于说话的节奏。具体来说,我们怀疑学习母语中短语和单词的典型节奏形状对一般节奏感知有深远的影响。如果我们的想法是正确的,那么节奏分组偏好应该可以从语言中小语言块(短语和单词)的时间结构中预测出来。
Taking a step back, our results show that the perception of rhythmic grouping, long thought to follow universal principles, actually varies by culture. Our explanation for this difference is based on the rhythms of speech. Specifically, we suspect that learning the typical rhythmic shape of phrases and words in the native language has a deep effect on rhythm perception in general. If our idea is correct, then rhythmic grouping preferences should be predictable from the temporal structure of small linguistic chunks (phrases and words) in a language.
这些发现强调了在测试听觉感知的一般原则时需要跨文化工作。许多关于音调节奏分组的原始工作都是由说西欧语言(例如英语、荷兰语和法语)的人完成的。尽管这些语言确实存在重要差异,但它们都遵循将短功能词放在小语言短语开头的模式,这可能解释了这些文化中感知分组的相似性。从更全球化的角度来看,带有词尾短功能词的语言很普遍,但主要存在于欧洲以外,例如印度和东亚(Haspelmath 等人,2005 年)。我们预测,以这些语言为母语的人会像日本听众那样(长-短)对交替持续时间的音调进行分组。
These findings highlight the need for cross-cultural work when it comes to testing general principles of auditory perception. Much of the original work on rhythmic grouping of tones was done with speakers of Western European languages (e.g., English, Dutch, and French). Although these languages do indeed have important differences, they all follow the pattern of putting short function words at the onset of small linguistic phrases, which may account for the similarity of perceptual grouping in these cultures. A more global perspective reveals that languages with phrase-final short function words are widespread, but exist largely outside of Europe, for example, in India and East Asia (Haspelmath et al., 2005). We predict that native speakers of these languages will group tones of alternating duration like Japanese listeners do (long-short).
这项工作的一个重要未来方向涉及儿童时期节奏分组偏好的发展。婴儿是否对特定的分组模式(例如,短-长)有天生的偏见,然后根据经验进行修改(参见 Trainor & Adams,2000)?或者它们是有节奏的“白板”?关于成年人对节奏的感知,如果说不同语言的人对非语言节奏的感知不同,这可能有助于解释西方人和日本人在简单音乐节奏的表现上存在差异的报道(Ohgushi,2002 年;Sadakata 等人,2004 年)。也就是说,简单的节奏在不同的文化中可能会有不同的表现,因为在学习过程中对它们的感知不同。这将表明,语音体验在非常基本的水平上塑造了非语言节奏认知。
An important future direction for this work concerns the development of rhythmic grouping preferences in childhood. Do infants have an innate bias for a particular grouping pattern (e.g., short-long), which is then modified by experience (cf. Trainor & Adams, 2000)? Or are they rhythmic “blank slates”? Regarding the perception of rhythm by adults, if speakers of different languages perceive nonlinguistic rhythm differently, this could help explain reports of differences between Westerners and Japanese in the performance of simple musical rhythms (Ohgushi, 2002; Sadakata et al., 2004). That is, simple rhythms may be performed differently in different cultures because they are perceived differently during learning. This would indicate that experience with speech shapes nonlinguistic rhythm cognition at a very basic level.
在本章中,我声称语音节奏和音乐节奏的某些方面表现出惊人的相似性,例如将事件分组为短语,而其他方面则根本不同,例如时间周期性的作用。神经数据在多大程度上支持这一说法?是否有证据表明语音和音乐节奏的某些方面由相似的大脑系统处理,而其他方面则显示出很少的神经重叠?
In this chapter, I have claimed that certain aspects of speech rhythm and musical rhythm show a striking similarity, such as the grouping of events into phrases, whereas other aspects are fundamentally different, such as the role of temporal periodicity. To what extent do neural data support this claim? Is there evidence that some aspects of rhythm in speech and music are handled by similar brain systems, whereas other aspects show little neural overlap?
首先关注分组,有证据表明两个领域中短语边界的大脑处理重叠。该证据来自正常个体的脑电反应(事件相关电位,ERP)。斯坦豪尔等人。(1999) 表明,语言中短语边界的感知与称为“闭合正移”(CPS) 的特定 ERP 组件相关联,这是一种在语调短语结束后不久开始的数百毫秒的中央顶叶正性. 使用过滤或哼唱语音(去除词汇线索并留下韵律线索)的进一步研究表明,CPS 对韵律而不是句法线索对短语边界敏感(Steinhauer & Friederici,2001;Pannekamp 等人,2005)。受这项工作的启发,Knösche 等人。(2005) 检查了音乐家的 ERP 到音乐短语的结尾,发现了一个类似于 Steinhauer 等人报告的 CPS 的组成部分。使用 MEG,他们还确定了可能参与音乐 CPS 生成的大脑区域。这些区域包括前后扣带皮层和后海马体。基于这些区域在注意力和记忆中发挥的作用,研究人员认为音乐 CPS 本身并不反映短语边界的检测,而是与将注意力从一个短语转移到下一个短语相关的记忆和注意力过程。这些区域包括前后扣带皮层和后海马体。基于这些区域在注意力和记忆中发挥的作用,研究人员认为音乐 CPS 本身并不反映短语边界的检测,而是与将注意力从一个短语转移到下一个短语相关的记忆和注意力过程。这些区域包括前后扣带皮层和后海马体。基于这些区域在注意力和记忆中发挥的作用,研究人员认为音乐 CPS 本身并不反映短语边界的检测,而是与将注意力从一个短语转移到下一个短语相关的记忆和注意力过程。
Focusing first on grouping, there is evidence for overlap in brain processing of phrase boundaries in both domains. This evidence comes from electrical brain responses (event-related potentials, ERPs) in normal individuals. Steinhauer et al. (1999) demonstrated that the perception of phrase boundaries in language is associated with a particular ERP component termed the “closure positive shift” (CPS), a centro-parietal positivity of a few hundred milliseconds that starts soon after the end of an intonational phrase. Further studies using filtered or hummed speech (to remove lexical cues and leave prosodic cues) showed that the CPS is sensitive to prosodic rather than syntactic cues to phrase boundaries (Steinhauer & Friederici, 2001; Pannekamp et al., 2005). Inspired by this work, Knösche et al. (2005) examined the ERPs in musicians to the ends of musical phrases, and found a component similar to the CPS reported by Steinhauer et al. Using MEG, they also identified brain areas that were likely to be involved in the generation of the CPS in music. These areas included the anterior and posterior cingulate cortex and the posterior hippocampus. Based on the roles these areas play in attention and memory, the researchers argue that the musical CPS does not reflect the detection of a phrase boundary per se, but memory and attention processes associated with shifting focus from one phrase to the next.
Steinhauer 等人的研究。和 Knösche 等人。为语言和音乐分组的比较神经研究指明了方向。然而,还有很大的工作空间。例如,在 Knosche 等人。研究具有短语边界的序列有内部停顿,而没有短语边界的序列则没有。最好比较有和没有短语边界但具有相同时间结构的序列,例如,使用谐波结构来指示短语(参见 Tan 等人,1981)。这样,与短语边界相关的 ERP 不能归因于刺激中的简单时间差异。还需要对大脑对语言和音乐中的短语的反应进行受试者内部研究。
The studies of Steinhauer et al. and Knösche et al. point the way to comparative neural studies of grouping in language and music. There is much room for further work, however. For example, in the Knosche et al. study the sequences with phrase boundaries have internal pauses, whereas the sequences without phrase boundaries do not. It would be preferable to compare sequences with and without phrase boundaries but with identical temporal structure, for example, using harmonic structure to indicate phrasing (cf. Tan et al., 1981). This way, ERPs associated with phrase boundaries cannot be attributed to simple temporal differences in the stimuli. It would also be desirable to conduct a within-subjects study of brain responses to phrases in language and music. Such comparative work should attend to the absolute duration of musical versus linguistic phrases, as the neural processes involved in grouping may be influenced by the size of the temporal unit over which information is integrated (Elbert et al., 1991; von Steinbuchel, 1998).
转向周期性问题,如果语音节奏和周期性音乐节奏由不同的神经机制服务,那么人们可以预测语言节奏能力与保持或跟随音乐节拍的能力之间的神经分离。神经心理学文献包含对脑损伤后音乐节律紊乱的个体的描述,或“获得性心律失常”(例如,Mavlov,1980 年;Fries & Swihart,1990 年;Peretz,1990 年;Liégeois-Chauvel 等人,1998 年;Schuppert 等人,2000 年;Wilson 等人,2002 年;Di Pietro 等人, 2003)。该文献的两个显着发现是节奏能力可以被选择性地破坏,而音高处理技能相对完整,并且需要简单区分时间模式的节奏任务与需要评估或产生周期模式的任务之间存在分离(例如,Peretz , 1990). 例如,Liégeois-Chauvel 等人。(1998) 发现左侧或右侧颞上回前部有病变的患者在测量任务上比在时间辨别任务上受损更严重。格律任务涉及将段落识别为华尔兹或进行曲,而时间辨别任务涉及对仅在持续时间模式方面不同的短旋律序列的相同不同判断。在韵律任务中,鼓励患者随着感知到的音乐节拍敲击,以帮助他们做出决定。威尔逊等人。(2002) 描述了一个音乐家的案例研究,该音乐家具有右颞顶叶中风,他可以辨别非韵律节奏但不能辨别韵律模式或产生稳定的脉搏。
Turning to the question of periodicity, if speech rhythms and periodic musical rhythms are served by different neural mechanisms, then one would predict neural dissociations between linguistic rhythmic ability and the ability to keep or follow a beat in music. The neuropsychological literature contains descriptions of individuals with musical rhythmic disturbance after brain damage, or “acquired arrhythmia” (e.g., Mavlov, 1980; Fries & Swihart, 1990; Peretz, 1990; Liégeois-Chauvel et al., 1998; Schuppert et al., 2000; Wilson et al., 2002; Di Pietro et al., 2003). Two notable findings from this literature are that rhythmic abilities can be selectively disrupted, leaving pitch processing skills relatively intact, and that there are dissociations between rhythmic tasks requiring simple discrimination of temporal patterns and those requiring the evaluation or production of periodic patterns (e.g., Peretz, 1990). For example, Liégeois-Chauvel et al. (1998) found that patients with lesions in the anterior part of the left or right superior temporal gyrus were much more impaired on a metrical task than on a temporal discrimination task. The metrical task involved identifying a passage as a waltz or a march, whereas the temporal discrimination task involved a same different judgment on short melodic sequences that differed only in terms of their duration pattern. In the metrical task, patients were encouraged to tap along with the perceived beat of the music to help them in their decision. Wilson et al. (2002) describe a case study of a musician with a right temporo-parietal stroke who could discriminate nonmetrical rhythms but who could not discriminate metrical patterns or produce a steady pulse.
不幸的是,这些研究都没有明确地着手比较语言和音乐的节奏能力。因此,该领域对于采用脑损伤后语音和音乐节奏的定量测量的比较研究是开放的。研究已知在脑损伤前具有良好音乐节奏能力和正常言语的个体,并检查言语节奏的破坏是否与时间模式辨别能力受损、韵律能力受损或两者都有关联,将特别有趣。
Unfortunately, none of these studies explicitly set out to compare rhythmic abilities in speech and music. Thus the field is wide open for comparative studies that employ quantitative measures of both speech and musical rhythm after brain damage. It would be particularly interesting to study individuals who were known to have good musical rhythmic abilities and normal speech before brain damage, and to examine whether disruptions of speech rhythm are associated with impaired temporal pattern discrimination, impaired metrical abilities, or both.
另一个有兴趣研究语音和音乐节奏的人群是患有“外国口音综合症”的人(Takayama 等人,1993 年)。在这种罕见的疾病中,脑损伤会导致说话韵律发生变化,给人以说话者带有外国口音的印象。这种疾病是否与语言节奏的系统性变化有关还有待确定,但如果是这样,我们可以检查这些人的音乐节奏技能是否有任何异常。
Another population of individuals who would be interesting to study with regard to speech and musical rhythm are individuals with “foreign accent syndrome” (Takayama et al., 1993). In this rare disorder, brain damage results in changes in speech prosody that give the impression that the speaker has acquired a foreign accent. It remains to be determined if this disorder is associated with systematic changes in speech rhythm, but if so, one could examine if such individuals have any abnormalities in their musical rhythmic skills.
当然,研究获得性心律失常和外国口音综合症的一个难点是这种情况非常罕见。因此,最好找到语言节奏或音乐节奏能力受损的更大人群,以便进行比较研究。有希望进行比较研究的人群是音盲或“先天性音乐障碍”的人,他们在音乐感知和制作方面存在严重困难,这不能归因于听力损失、缺乏音乐接触或任何明显的非音乐社交/认知障碍( Ayotte 等人,2002 年)。与这些人一起工作的一个好处是,通过广告和谨慎的过程,可以很容易地在任何大型社区中找到他们筛选(Ayotte 等人,2002 年;Foxton 等人,2004 年)。这些人似乎在音高处理的基本方面存在问题,例如辨别小音高变化或确定小音高变化的方向(即,音高是上升还是下降)(Peretz & Hyde,2003 年;Foxton 等人,2004 年) ). 有趣的是,它们在辨别简单的时间模式方面似乎并没有受损,并且可以成功地与简单的节拍器同步。然而,他们确实很难与音乐节拍同步(Dalla Bella & Peretz,2003)。当然,与音乐同步的困难可能仅仅是由于音高处理缺陷导致的音高变化刺激引起的分心(参见 Foxton 等人,2006 年)。因此,未来对先天性失乐症节拍感知的研究应该使用没有音高变化的复杂节奏序列,例如 Patel、Iversen 等人的研究中使用的那些。(2005) 描述于上文第 3.2.1 节(参见声音示例 3.3 和 3.4)。如果音乐性聋哑人无法与此类序列的节拍同步,这表明保持音乐节拍的机制与言语节奏无关(因为音乐性聋哑人的讲话听起来完全正常)。29
Of course, a difficulty in studying acquired arrhythmia and foreign accent syndrome is that such cases are quite rare. Thus it would be preferable to find larger populations in which either speech rhythm or musical rhythmic abilities were impaired, in order to conduct comparative research. One population that holds promise for comparative studies are tone-deaf or “congenital amusic” individuals who have severe difficulties with music perception and production which cannot be attributed to hearing loss, lack of exposure to music, or any obvious nonmusical social/cognitive impairments (Ayotte et al., 2002). One advantage of working with such individuals is that they can easily be found in any large community through a process of advertising and careful screening (Ayotte et al., 2002; Foxton et al., 2004). Such individuals appear to have problems with basic aspects of pitch processing, such as discriminating small pitch changes or determining the direction of small pitch changes (i.e., whether pitch goes up or down) (Peretz & Hyde, 2003; Foxton et al., 2004). Interestingly, they do not seem to be impaired in discriminating simple temporal patterns and can synchronize successfully to a simple metronome. However, they do have difficulty synchronizing to the beat of music (Dalla Bella & Peretz, 2003). Of course, it could be that the difficulty in synchronizing with music is simply due to the distraction caused by a stimulus with pitch variation, due to deficits in pitch processing (cf. Foxton et al., 2006). Thus future studies of beat perception in congenital amusia should use complex rhythmic sequences with no pitch variation, such as those used in the study of Patel, Iversen, et al. (2005) described in section 3.2.1 above (cf. Sound Examples 3.3. and 3.4). If musically tone-deaf individuals cannot synchronize to the beat of such sequences, this would suggest that the mechanisms involved in keeping a beat in music have nothing to do with speech rhythm (because the speech of musically tone-deaf individuals sounds perfectly normal).29
我怀疑未来的研究将揭示生产或感知中的语音节奏能力与涉及周期性的音乐节奏能力(例如韵律辨别或节拍感知和同步)之间的关系不大。这将支持周期性在语音节奏中不起作用的观点。
I suspect that future research will reveal little relationship between speech rhythm abilities in either production or perception and musical rhythm abilities involving periodicity (such as metrical discrimination or beat perception and synchronization). This would support the point that periodicity does not play a role in speech rhythm.
语音和音乐涉及声音的系统时间、重音和短语模式。也就是说,两者都是有节奏的,而且它们的节奏表现出重要的相同点和不同点。一个相似点是分组结构:在两个域中,元素(例如音调和单词)被分组为更高级别的水平单元,例如短语。一个关键的区别是时间周期性,它在音乐节奏中很普遍,但在语音节奏中却缺乏。具有讽刺意味的是,语音具有周期性时间结构的想法推动了很多关于语音节奏的早期研究,并且是今天仍然存在的语言节奏类型学的基础(重音时间语言与音节时间语言)。然而,很明显,语音中的等时性概念并没有得到经验支持。幸运的是,最近很多关于语音节奏的实证研究已经放弃了等时性的概念,并且正在转向更丰富的语音节奏概念,该概念基于语言在元音、辅音和音节的时间模式上的差异。推动这项研究的一个关键思想是语言节奏是各种相互作用的语音现象的产物,
Speech and music involve the systematic temporal, accentual, and phrasal patterning of sound. That is, both are rhythmic, and their rhythms show both important similarities and differences. One similarity is grouping structure: In both domains, elements (such as tones and words) are grouped into higher level units such as phrases. A key difference is temporal periodicity, which is widespread in musical rhythm but lacking in speech rhythm. Ironically, the idea that speech has periodic temporal structure drove much of the early research on speech rhythm, and was the basis for a rhythmic typology of languages which persists today (stress-timed vs. syllable-timed languages). It is quite evident, however, that the notion of isochrony in speech is not empirically supported. Fortunately, much recent empirical research on speech rhythm has abandoned the notion of isochrony, and is moving toward a richer notion of speech rhythm based on how languages differ in the temporal patterning of vowels, consonants, and syllables. A key idea that motivates this research is that linguistic rhythm is the product of a variety of interacting phonological phenomena, and not an organizing principle, unlike the case of music.
打破语音和音乐之间的“周期性联系”似乎会减少在领域之间找到有趣的节奏关系的机会。事实上,反之亦然。将比较工作的重点从节奏的周期性方面转移到非周期性方面揭示了这些领域之间的许多有趣的联系,例如音乐中语音时间模式的反映,以及语音节奏对非语言节奏分组偏好的影响。尽管还有更多的联系有待探索,但很明显,从复杂的声学信号中提取节奏结构的一些关键过程为音乐和语言所共有。
It may seem that breaking the “periodicity link” between speech and music would diminish the chance of finding interesting rhythmic relations between the domains. In fact, the converse is true. Changing the focus of comparative work from periodic to nonperiodic aspects of rhythm reveals numerous interesting connections between the domains, such as the reflection of speech timing patterns in music, and the influence of speech rhythms on nonlinguistic rhythmic grouping preferences. Although many more connections await exploration, it seems clear that some of the key processes that extract rhythmic structure from complex acoustic signals are shared by music and language.
这是第 3.3.1节“持续时间和类型学”小节的附录。nPVI 方程为:
This is an appendix for section 3.3.1, subsection “Duration and Typology.” The nPVI equation is:
在这个等式中,m是序列中持续时间的数量(例如,句子中的元音持续时间),d k是第k个元素的持续时间。nPVI 计算序列中每对连续持续时间之间的差值的绝对值,用这两个持续时间的平均值归一化(这种归一化最初是为了控制语速的波动而引入的)。这会将 m 个持续时间的序列转换为m个持续时间的序列– 1 对比得分。这些分数中的每一个都介于 0(当两个持续时间相同时)和 2(对于最大持续时间对比度,即,当其中一个持续时间接近零时)。这些分数的平均值乘以 100,得到序列的 nPVI。因此,序列的 nPVI 值受 0 和 200 的下限和上限的限制,数字越大表示相邻元素之间的持续对比程度越大。
In this equation, m is the number of durations in the sequence (e.g., vowel durations in a sentence) and dk is the duration of the kth element. The nPVI computes the absolute value of the difference between each successive pair of durations in a sequence, normalized by the mean of these two durations (this normalization was originally introduced to control for fluctuations in speech rate). This converts a sequence of m durations to a sequence of m – 1 contrastiveness scores. Each of these scores ranges between 0 (when the two durations are identical) and 2 (for maximum durational contrast, i.e., when one of the durations approaches zero). The mean of these scores, multiplied by 100, yields the nPVI of the sequence. The nPVI value for a sequence is thus bounded by lower and upper limits of 0 and 200, with higher numbers indicating a greater degree of durational contrast between neighboring elements.
这是第 3 章第3.5.1 节表3.1的附录。数据由 David Huron 友情提供。
This is an appendix for Chapter 3, section 3.5.1, Table 3.1. Data kindly provided by David Huron.
在下表中,% C = 作曲家数量,sd = 标准差。
In the tables below, % C = number of composers, sd = standard deviation.
上表中的数据按语言重新分组:英语 = 美式、英语、爱尔兰语;French = 法语,比利时语;German = 德国人、奥地利人、奥匈帝国人;斯拉夫语 = 俄罗斯语、捷克语、波兰语、波希米亚语;西班牙语 = 西班牙语、加泰罗尼亚语、古巴语、墨西哥语;斯堪的纳维亚语 = 丹麦语、挪威语、瑞典语(不是芬兰语)
Data from the above table regrouped by language: English = American, English, Irish; French = French, Belgian; German = German, Austrian, Austro-Hungarian; Slavic = Russian, Czech, Polish, Bohemian; Spanish = Spanish, Catalan, Cuban, Mexican; Scandinavian = Danish, Norwegian, Swedish (not Finnish)
1这对应于 6/8 的拍号,而华尔兹的拍号为 3/4。可以看出,尽管在数学术语中 6/8 = 3/4,但这些比率在音乐环境中指的是相当不同的组织形式。
1 This corresponds to a time signature of 6/8, in contrast to a waltz, which has a time signature of 3/4. As one can see, although 6/8 = 3/4 in mathematical terms, these ratios refer to rather different forms of organization in a musical context.
2一个明显的例外出现在印度古典音乐的长节奏循环中(例如,16 拍或更多的循环),其中每个循环第一拍的强重音可以间隔 10 秒或更长时间,但起着重要的感知作用在音乐中的作用。然而,这可能是一个证明规则的例外,因为听众明确计算节拍是这种音乐的聆听传统的一部分。也就是说,要花费有意识的努力来跟踪音乐在其长韵律周期中的位置。
2 An apparent exception occurs in long rhythmic cycles of Indian classical music (e.g., cycles of 16 beats or more), in which the strong accent on the first beat of each cycle can be separated by 10 seconds or more, yet plays an important perceptual role in the music. However, this may be an exception that proves the rule, as explicit counting of the beats by the audience is part of the listening tradition in this music. That is, conscious effort is expended in order to keep track of where the music is in its long metrical cycle.
3 β 频段的神经活动与运动系统有关,这提高了大脑中的仪表感知涉及听觉和运动系统之间某种耦合的可能性,即使在没有明显运动的情况下也是如此。
3 Neural activity in the beta frequency band has been associated with the motor system, raising the possibility that meter perception in the brain involves some sort of coupling between the auditory and motor system, even in the absence of overt movement.
4舒尔金德等人。(2003) 对旋律结构进行了有趣的分析,这有助于说明为什么短语边界可能是旋律识别中的重要标志。他们检查了旋律中音高和时间重音的时间分布,其中音高重音被定义为属于主音三重奏或轮廓变化点的音符,时间重音被定义为相对较长的音符和韵律重音的音符。他们发现短语边界处的重音密度高于短语内。(对于那些不熟悉主音三元组概念的人,它在第 5 章中有解释)。因此,短语的边缘是吸引各种重音的结构槽。
4 Schulkind et al. (2003) conducted an interesting analysis of melodic structure that helps suggest why phrase boundaries might be important landmarks in melody recognition. They examined the temporal distribution of pitch and temporal accents in melodies, in which pitch accents were defined as notes that were members of the tonic triad or points of contour change, and temporal accents were defined as relatively long notes and metrically accented notes. They found that accent density was higher at phrase boundaries than within phrases. (For those unfamiliar with the concept of a tonic triad, it is explained in Chapter 5). Thus the edges of phrases are structural slots that attract accents of various kinds.
5在对 21 种类型不同的语言进行的一项调查中,Jun(2005 年)发现所有语言在单词之上至少有一个分组级别,大多数语言有两个。
5 In a survey of 21 typologically different languages, Jun (2005) found that all languages had at least one grouping level above the word, and most had two.
6更具体地说,延长仅限于音节韵母(元音和随后的辅音)。有趣的是,这种由韵律边界引起的音节不对称扩展不同于由重音引起的音节的时间变化 (Beckman et al., 1992)。
6 More specifically, the lengthening was confined to the syllabic rime (the vowel and following consonants). Interestingly, this asymmetric expansion of the syllable due to prosodic boundaries differs from temporal changes in a syllable due to stress (Beckman et al., 1992).
7在进行这项研究时,重要的是要注意语音和音乐中短语边界附近的持续时间延长:如果两个域中存在不同程度的边界前延长,则应将边界附近的事件排除在外分析,因为这会与可变性措施混淆。
7 In doing this research, it would be important to be aware of durational lengthening in the vicinity of phrase boundaries in both speech and music: If there are different degrees of preboundary lengthening in the two domains, then events near boundaries should be excluded from the analysis as this would be confounded with variability measures.
8感谢 Bruno Repp 为我提供这些数据。
8 I am grateful to Bruno Repp for providing me with this data.
9感知响度包括物理强度和不同频段的能量分布,换言之,“频谱平衡”。后者可能比前者更显着和可靠(Sluijter & van Heuven, 1996, Sluijter et al., 1997)。
9 Perceived loudness incorporates both physical intensity and the distribution of energy across different frequency bands, in other words, “spectral balance.” The latter may be a more salient and reliable cue than the former (Sluijter & van Heuven, 1996, Sluijter et al., 1997).
10在音高重音语言中,一个词可以根据其音高模式具有完全不同的含义。声调语言和重音语言的区别在于前者的每个音节都有规定的音高,而在重音语言中,一个词的某个音节可能有音高的词汇规范(Jun,2005)。
10 In a pitch-accent language, a word can have entirely different meaning depending on its pitch pattern. The difference between a tone language and a pitch-accent language is that in the former there is a prescribed pitch for each syllable, whereas in pitch-accent languages a certain syllable of a word may have lexical specification for pitch (Jun, 2005).
11例如,Ladefoged 指出, kakemono(滚动)这个词与“nippon”(日本)花费的时间大致相同,并将其归因于这两个词都包含四个 morae:[ka ke mo no ] 和 [ni p po n]。
11 As an example, Ladefoged points out that the word kakemono (scroll) takes about the same amount of time to say as “nippon” (Japan), and attributes this to the fact that both words contain four morae: [ka ke mo no] and [ni p po n].
12 Abercrombie 的音节根植于胸脉的理论也被证伪了。应该指出的是,阿伯克龙比是一位先驱科学家,他建立了首批致力于语音学基础研究的实验室之一(在爱丁堡)。他关于语音节奏的想法只是他工作的一小部分,虽然是错误的,但却激发了大量研究。
12 Abercrombie’s theory of syllables as rooted in chest pulses has also been falsified. It should be noted that Abercrombie was a pioneering scientist who established one of the first laboratories devoted to basic research in phonetics (in Edinburgh). His ideas about speech rhythm are but a tiny slice of his work, and though wrong, stimulated a great deal of research.
13音节通常被认为具有三个结构槽:起始辅音、核心(通常由元音占据)和随后的辅音(称为尾音)。一个音节在开头有一个辅音而在尾部没有一个音节用 CV 表示,而 CCVC 表示在开头有两个辅音而在尾部有一个,依此类推。
13 Syllables are generally recognized as having three structural slots: the onset consonant(s), the nucleus (usually occupied by a vowel), and the following consonants (referred to as the coda). A syllable with one consonant in the onset and none in the coda is represented by CV, whereas CCVC means two consonants in the onset and one in the coda, and so on.
14 Dauer 还提出重读和音节计时语言在重读和语调之间有不同的关系:在前者中,重读音节作为语调轮廓的转折点,而在后者中,语调和重读更加独立。由于本章不讨论语调,所以这里不讨论这个想法。
14 Dauer also suggested that stress- and syllable-timed languages had a different relationship between stress and intonation: In the former, stressed syllables serve as turning points in the intonation contour, whereas in the latter, intonation and stress are more independent. As intonation is not discussed in this chapter, this idea will not be pursued here.
15虽然 Ramus 等人。(1999) ΔC 和 %V 与音节结构的相关差异,Frota 和 Vigário (2001) 指出,在 BP/EP 案例中,这些变量的差异是由元音减少驱动的,因为这两个变体的音节结构相似。有关详细信息,请参阅 Frota 和 Vigário (2001)。Frota 和 Vigário 还提供了关于 ΔC 参数的非常有用的讨论,以及针对整个句子持续时间/语速对该变量进行标准化的必要性(另见 Ramus,2002a)。
15 Although Ramus et al. (1999) related differences in ΔC and %V to syllable structure, Frota and Vigário (2001) point out that in the BP/EP case, differences in these variables are driven by vowel reduction, because syllable structures are similar in the two varieties. See Frota and Vigário (2001) for details. Frota and Vigário also provide a very useful discussion of the ΔC parameter and the need to normalize this variable for overall sentence duration/speech rate (see also Ramus, 2002a).
16正在比较两种语言的音节持续时间模式的个别研究人员也可以通过以相对于手头假设保守的方式做出所有此类决定来处理音节边界识别问题。例如,如果将语言 A 和 B 与语言 A 的句子中音节的持续时间变化更大的假设进行比较,那么任何关于音节边界的判断都应该以与该假设相反的方式进行。
16 Individual researchers who are comparing syllable duration patterns across two languages can also handle the problem of syllable boundary identification by making all such decisions in a manner that is conservative with regard to the hypothesis at hand. For example, if comparing languages A and B with the hypothesis that syllables are more variable in duration in the sentences of language A, then any judgment calls about syllable boundaries should be made in such a way as to work against this hypothesis.
17然而,nPVI 与节律类之间的相关性并不完美。Grabe 和 Low (2002) 发现泰米尔语的元音 nPVI 值很高,泰米尔语是一种被归类为音节计时的语言(参见 Keane,2006)。应该注意的是,Grabe 和 Low (2002) 的结果应该被认为是初步的,因为每种语言只研究了一个说话人。随后的工作已将 nPVI 应用于具有较少语言但每种语言的说话者较多的跨语言语料库(例如,Ramus, 2002b; Lee & Todd, 2004; Dellwo, 2004)。
17 The correlation between nPVI and rhythm class is not perfect, however. Grabe and Low (2002) found a high vowel nPVI value for Tamil, a language which has been classified as syllable-timed (cf. Keane, 2006). It should be noted that Grabe and Low’s (2002) results should be considered preliminary because only one speaker per language was studied. Subsequent work has applied the nPVI to cross-linguistic corpora with fewer languages but more speakers per language (e.g., Ramus, 2002b; Lee & Todd, 2004; Dellwo, 2004).
18应该注意的是,Grabe 和 Low (2002) 以及 Ramus (2002b) 都测量了元音音程的 nPVI,定义为元音和连续元音序列,不考虑音节和单词边界,而 Bolinger 的论点则集中在单个元音上。这不是一个严重的问题,因为由于语音中元音很容易被辅音分开,所以大多数元音音程都是单独的元音。例如,在 Ramus 使用的数据库(8 种语言,160 个句子)中,有 2,725 个元音,其中 2,475 个(91%)是单元音,换句话说,一个元音两侧有一个辅音(或者如果元音是句子的第一个或最后一个音素,一个单独的元音两侧分别是后面或前面的辅音)。
18 It should be noted that both Grabe and Low (2002) and Ramus (2002b) measured the nPVI of vocalic intervals, defined as vowels and sequences of consecutive vowels irrespective of syllable and word boundaries, whereas Bolinger’s arguments are focused on individual vowels. This is not a serious problem because most vocalic intervals are individual vowels due to the strong tendency for vowels to be separated by consonants in speech. For example, in the database used by Ramus (eight languages, 160 sentences), there are 2,725 vowels, out of which 2,475 (91%) are singletons, in other words, a single vowel flanked by a consonant on either side (or if the vowel is the first or last phoneme of the sentence, a single vowel flanked by a following or preceding consonant, respectively). Thus it is likely that nPVI measurements based on individual vowels would produce qualitatively similar results as those based on vocalic intervals (cf. Patel et al., 2006).
20未来关于节奏类的研究最好为节奏类建议新的名称,因为当前的名称(重音计时、音节计时和 mora 计时)隐含地与等时性(失败的)概念联系在一起。
20 It would be desirable for future research on rhythmic classes to suggest new names for rhythmic classes, as the current names (stress-timed, syllable-timed, and mora-timed) are implicitly bound up with the (failed) notion of isochrony.
21 Nazzi 等人的优雅研究中可能存在混淆。(1998) 是语调的存在,这可能在婴儿的歧视中发挥了作用。事实上,Ramus 等人。(2000) 发现法国新生儿可以使用再合成的 saltanaj 语音区分荷兰语和日语,但是当句子的原始 F0 轮廓被相同的人工轮廓替换时,他们的辨别能力要弱得多 (Ramus, 2002b)。他还指出,使用平坦的 sasasa 重新合成可以完全消除语调,但是由此产生的声音模式对于新生儿和婴儿来说是有问题的,他们可能会觉得它们无聊或令人痛苦。
21 One possible confound in the elegant study of Nazzi et al. (1998) is the presence of intonation, which may have played a role in the infants’ discrimination. Indeed, Ramus et al. (2000) found that French newborns could distinguish Dutch from Japanese using resynthesized saltanaj speech, but that their discrimination ability was much weaker when the original F0 contours of the sentences were replaced by the same artificial contours (Ramus, 2002b). He also notes that intonation can be removed entirely using flat sasasa resynthesis, but that the resulting sound patterns are problematic for use with newborns and infants, who may find them boring or distressing.
22Jackendoff(1989)在语言节奏的层次理论和音乐结构理论之间建立了早期的概念联系,他注意到一种用于描述语言中的层次突出关系的韵律树结构与Lerdahl 和 Jackendoff (1983) 指出事件在音符跨度中的相对结构重要性。Jackendoff 推测,这两种形式主义的巧合可能反映了这样一个事实,即语言和音乐使用通用心理原则的不同专业化来将结构分配给时间模式,换句话说,将声音序列解析为结构的二元对立的递归层次结构的原则重要性。然而,正如 Jackendoff 所指出的,
22 An early conceptual link between hierarchical theories of linguistic rhythm and theories of musical structure was made by Jackendoff (1989), who noted a structural equivalence between one type of prosodic tree structure used to depict hierarchical prominence relations in language and a type of tree used by Lerdahl and Jackendoff (1983) to indicate the relative structural importance of events in a span of musical notes. Jackendoff speculated that the coincidence of these two formalisms might reflect the fact that language and music use different specializations of general-purpose mental principles for assigning structure to temporal patterns, in other words, principles that parse sound sequences into recursive hierarchies of binary oppositions of structural importance. As noted by Jackendoff, however, prosodic trees have been largely abandoned in theories of speech rhythm.
23这样的研究必须小心,以尝试匹配说话和音乐声音开始的声学特性。例如,如果使用具有尖锐起音的音乐声音,例如钢琴音调,则应使用具有爆破音起音的语音(例如 /ta/)而不是具有渐进起音的语音(例如 /la /).
23 Such a study will have to be careful to try and match acoustic properties of the onset of the spoken and musical sound. For example, if a musical sound with a sharp attack is used, such as a piano tone, then a speech sound with a plosive onset (such as /ta/) should be used rather than one with a gradual onset (such as /la/).
24感谢 Laura Dilley 在这些句子中标记重读音节。
24 I am grateful to Laura Dilley for marking stressed syllables in these sentences.
25感谢格雷姆·布恩 (Graeme Boone) 提请我注意这封信件。
25 I am grateful to Graeme Boone for bringing this correspondence to my attention.
26虽然英语和法语确实如此,但应该注意的是,在日语中,映射到音符上的是音节而不是音节 (Hayes, 1995a)。
26 Although this is true for English and French, it should be noted that in Japanese it is the mora and not the syllable that gets mapped onto a musical note (Hayes, 1995a).
27图 3.14中显示的英语和法语语音的 nPVI 值取自 Patel 等人。(2006),而不是来自 Patel 和 Daniele (2003a)。两项研究都表明两种语言之间存在显着差异(英语 nPVI > 法语 nPVI),但 2006 年的研究基于更准确的测量。见帕特尔等人。(2006) 测量细节,以及所有分析的句子列表。
27 The nPVI values for English and French speech shown in Figure 3.14 are taken from Patel et al. (2006), rather than from Patel and Daniele (2003a). Both studies show a significant difference between the two languages (English nPVI > French nPVI), but the 2006 study is based on more accurate measurements. See Patel et al. (2006) for measurement details, and for a list of all sentences analyzed.
28与德国音乐不同,根据 Barlow 和 Morgenstern 词典中的主题(Greig,2003 年),英国和法国音乐在同一时期内并未显示出 nPVI 的显着增加。这就提出了一个有趣的音乐学难题:为什么德国和奥地利音乐在这种节奏尺度上表现出如此强烈的历史变化,而英国和法国音乐却没有?
28 Unlike German music, English and French music do not show a significant increase in nPVI over the equivalent time period, based on themes in Barlow and Morgenstern’s dictionary (Greig, 2003). This raises an interesting musicological puzzle: Why do German and Austrian music show such a strong historical change in this measure of rhythm, whereas English and French music do not?
29当然,在得出任何确定的结论之前,需要对音盲个体的语音节奏进行定量测量,以表明它与正常控制(例如,使用 nPVI)没有区别。此外,音盲的人可能无法保持节拍,因为他们的音调感知问题导致了对音乐的厌恶,以至于他们没有足够的音乐接触来学习如何保持节拍。因此,最好与具有正常音高感知和享受音乐但无法保持节拍的个人一起工作。这种“节奏聋”的人的存在在直觉上是合理的,因为肯定有人喜欢音乐但声称在跳舞时有“两只左脚”,和/或不能随着节拍拍手。
29 Of course, before any firm conclusions can be drawn, the speech rhythm of tone-deaf individuals would need to be quantitatively measured to show that it did not differ from normal controls (for example, using the nPVI). Also, it is possible that tone-deaf individuals cannot keep a beat because their pitch perception problem has caused an aversion to music, so that they have not had enough exposure to music to learn how to keep a beat. Thus it would be preferable to work with individuals who have normal pitch perception and who enjoy music, but who cannot keep a beat. The existence of such “rhythm-deaf” individuals is intuitively plausible, as there are certainly people who like music but claim to have “two left feet” when it comes to dancing, and/or who cannot clap along with a beat. It should be possible to find a population of such people through a process of advertising and screening, akin to the procedures used to find congenital amusics.
Chapter 4
Melody
4.1 Introduction
4.1.1 Important Differences Between Musical and Linguistic Melody
4.1.2 A Brief Introduction to Linguistic Intonation
4.2 Melody in Music: Comparisons to Speech
4.2.1 Grouping Structure
4.2.2 Beat and Meter
4.2.3 Melodic Contour
4.2.4 Intervallic Implications
4.2.5 Motivic Similarity
4.2.6 Tonality Relations: Pitch Hierarchies
4.2.7 Tonality Relations: Event Hierarchies
4.2.8 Tonality Relations: Implied Harmony
4.2.9 Meta-Relations
4.2.10 Musical Versus Linguistic Melody: Interim Summary
4.3 Speech Melody: Links to Music
4.3.1 Intonation and Phonology
4.3.2 Intonation and Perception
4.4 Interlude: Musical and Linguistic Melody in Song
4.5 Melodic Statistics and Melodic Contour as Key Links
4.5.1 Melodic Statistics
4.5.2 Melodic Contour Perception
Melodic Contour Perception in Acquired Amusia
Melodic Contour Perception in Musical Tone Deafness
The Melodic Contour Deafness Hypothesis
4.6 Conclusion
Appendix
旋律是一个很难定义的直观概念。标准的字典定义常常不能令人满意。例如,“根据给定的文化习俗和约束在音乐时间安排的高音”(Ringer,2001:363)调用“音乐时间”。然而,旋律的概念并不局限于音乐,因为语言学家长期以来一直使用该术语来指代语音中有组织的音调模式(Steele,1779;'t Hart 等,1990)。从上面的定义中删除“音乐”似乎可以解决这个问题,但由此产生的描述将允许非常简单的音调序列,例如欧洲救护车警报器的交替双音模式。这样的模式很难被称为“旋律”。
Melody is an intuitive concept that is hard to define. Standard dictionary definitions often prove unsatisfactory. For example, “pitched sounds arranged in musical time in accordance with given cultural conventions and constraints” (Ringer, 2001:363) invokes “musical time.” Yet the notion of melody is not confined to music, because linguists have long used the term to refer to organized pitch patterns in speech (Steele, 1779; ’t Hart et al., 1990). It may seem that dropping “musical” from the above definition would solve this problem, but the resulting description would allow very simple tone sequences, such as the alternating two-tone pattern of European ambulance sirens. Such patterns hardly qualify as “melodic.”
能否找到包含音乐和语音的单一旋律定义?一种可能性是“一个有组织的音调序列,向听众传达丰富多样的信息。” 这个定义强调两点。首先,旋律是包含大量信息的音调序列。例如,语音旋律可以传达情感、句法、语用和强调的信息。音乐旋律还可以传达多种多样的信息,详见下文第 4.2 节。第二点是,音调序列凭借其在听众中产生的丰富心理模式而有资格作为旋律。也就是说,旋律感知是一个建设性的过程,大脑通过这个过程将一系列音调转化为有意义的关系网络。
Can a single definition of melody be found that encompasses both music and speech? One possibility is “an organized sequence of pitches that conveys a rich variety of information to a listener.” This definition emphasizes two points. First, melodies are tone sequences that pack a large informational punch. For example, speech melody can convey affective, syntactic, pragmatic, and emphatic information. Musical melody can also convey a broad variety of information, as detailed in section 4.2 below. The second point is that a tone sequence qualifies as a melody by virtue of the rich mental patterns it engenders in a listener. That is, melody perception is a constructive process by which the mind converts a sequence of tones into a network of meaningful relationships.
语音和音乐中的旋律在结构和认知处理方面如何比较?解决这个问题的第一步是定义问题的范围。本章重点介绍被称为语调的语音旋律方面,换句话说,在词后级别组织音调模式( Jun,2003 年,2005 年)。这种模式不会影响单个词的语义。这与词汇音高对比形成对比,词汇音高对比出现在普通话和约鲁巴语等声调语言以及瑞典语和日语等重音语言中。(虽然本章第 4.4节涉及关于词汇音高对比,第 2 章对词汇声调和音乐之间的关系进行了更详细的处理。)此外,本章还研究了音乐与语言语调的关系,即传达结构(与情感)信息的语音旋律方面(参见第 4.1.2 节)。第六章(关于音乐和语言的意义)探讨了音乐与情感语调的关系。语言和情感语调在不同的章节中处理,因为它们在概念上是不同的,并且与不同的研究机构相关。因此,本章以后,如无特殊说明,“语调”、“语言旋律”、“语音旋律”均指语言语调。
How do melody in speech and music compare in terms of structure and cognitive processing? A first step in addressing this question is to define the scope of the problem. This chapter focuses on that aspect of speech melody known as intonation, in other words, organized pitch patterns at the postlexical level (Jun, 2003, 2005). Such patterns do not influence the semantic meaning of individual words. This is in contrast to lexical pitch contrasts, which occur in tone languages such as Mandarin and Yoruba and in pitch-accent languages such as Swedish and Japanese. (Although section 4.4 of this chapter touches on lexical pitch contrasts, a more detailed treatment of the relation between lexical tones and music is given in Chapter 2.) Furthermore, this chapter examines music’s relation to linguistic intonation, that aspect of speech melody that conveys structural (vs. affective) information (cf. section 4.1.2). Music’s relation to affective intonation is explored in Chapter 6 (on meaning in music and language). Linguistic and affective intonation are treated in different chapters because they are conceptually distinct and are associated with different bodies of research. Thus henceforth in this chapter, “intonation,” “linguistic melody,” and “speech melody” refer to linguistic intonation unless otherwise specified.
任何对音乐旋律和语言旋律的系统比较都必须从一开始就承认它们之间存在重要差异。首先,大多数音乐旋律都是围绕一组稳定的音高间隔构建的,而语言旋律则不是(参见第 2 章)。尽管音乐旋律中精确的音程设置因文化而异,但根据音程和音阶来组织音高是音乐与普通语音之间的显着差异。这种差异的后果是深远的。例如,一个稳定的音程系统允许音乐旋律利用音调中心,一个作为旋律感知重心的焦点音高。音程系统还允许在旋律中创建音高稳定性的层次结构,如在第 4.2.6 节。相比之下,语调的“声调”没有这样的组织:每个声调都用在语言上合适的地方,并且没有任何意义,其中一些声调比其他声调更稳定或更重要。1音程系统的另一个结果是,当与节拍和节拍提供的时间网格结合时,会为音调之间的一组精心设计的结构关系创建一个脚手架,如第 4.2.2 节所述. 这可能是音乐旋律在美学上如此强大的部分原因。相比之下,语调轮廓中的音高关系网络并不那么丰富。因此,语调轮廓在美学上是惰性的,人们很少哼唱语调轮廓或发现自己被语音的音高模式所吸引。2这是非常明智的,因为音乐旋律是审美对象,一个本身就是目的的声音序列,而语言语调轮廓只是达到目的的一种手段,换句话说,是为日常语言功能服务的音调。如果音乐旋律是“一组彼此相爱的音调”(Shaheen,引自 Hast 等人,1999 年),那么语言旋律就是一组协同工作以完成工作的音调。
Any systematic comparison of musical and linguistic melody must acknowledge at the outset that there are important differences between them. First and foremost, most musical melodies are built around a stable set of pitch intervals, whereas linguistic melodies are not (cf. Chapter 2). Although the precise set of intervals in musical melodies varies by culture, the organization of pitch in terms of intervals and scales is a salient difference between music and ordinary speech. The consequences of this difference are profound. For example, a stable system of intervals allows musical melodies to make use of a tonal center, a focal pitch that serves as a perceptual center of gravity for the melody. An interval system also allows the creation of a hierarchy of pitch stability in melodies, as discussed in section 4.2.6. In contrast, the “tones” of intonation have no such organization: Each tone is used where it is linguistically appropriate and there is no sense in which some are more stable or central than others.1 Another consequence of an interval system is that when combined with a temporal grid provided by beat and meter, a scaffolding is created for an elaborate set of structural relations between tones, as discussed in section 4.2.2. This is likely part of what makes musical melodies so aesthetically potent. In contrast, the network of pitch relations in intonation contours is not nearly as rich. As a result, intonation contours are aesthetically inert, as evidenced by the fact that people rarely hum intonation contours or find themselves captivated by the pitch patterns of speech.2 This is quite sensible, as a musical melody is an aesthetic object, a sound sequence that is an end in itself, whereas a linguistic intonation contour is simply a means to an end, in other words, pitch in the service of quotidian linguistic functions. If a musical melody is “a group of tones in love with each other” (Shaheen, quoted in Hast et al., 1999), then a linguistic melody is a group of tones that work together to get a job done.
音乐旋律和语言旋律之间的第二个区别是,对后者的感知似乎受到语言独有的特定期望的影响。这种预期涉及一种称为“偏角”的现象,即在话语过程中基线音高逐渐降低和音高范围变窄。这种现象可能具有驱动声带振动的气压下降的生理基础 (Collier, 1975)。3个听众在判断一段话的前半部分和后半部分的音高运动是否相等时,似乎会考虑到这一现象 (Pierrehumbert, 1979; Terken, 1991)。例如,Terken 为 7 音节无意义音节话语 /mamámamamamáma/ 构建了合成语调轮廓,在第二个和第六个音节上带有音高重音,如图 4.1所示。
A second difference between musical and linguistic melody is that the perception of the latter appears to be influenced by a particular kind of expectation that is unique to speech. This expectation concerns a phenomenon known as “declination,” a gradual lowering of the baseline pitch and narrowing of pitch range over the course of an utterance. This phenomenon may have a physiological basis in the decline of air pressure driving vocal fold vibration (Collier, 1975).3 Listeners appear to take this phenomenon into account when making judgments about the equivalence of pitch movements in earlier versus later portions of an utterance (Pierrehumbert, 1979; Terken, 1991). For example, Terken constructed synthetic intonation contours for the 7-syllable nonsense syllable utterance/mamámamamamáma/, with pitch accents on the second and sixth syllable, as shown schematically in Figure 4.1.
在一种情况下,没有基线偏角,参与者的任务是调整第二个俯仰运动的高度,使其与第一个运动具有相同的突出度。Terken 发现听众使第二个峰值明显低于第一个峰值,就好像他们在讲话过程中受到 F0 偏角预期的影响。(也就是说,句子后面的身体上较小的动作似乎与句子前面的较大动作一样突出,因为隐含的预期是话语中的音高范围在后面较窄。)有趣的是,Terken 还包括一个条件,其中听众被指示均衡峰值音调这两个运动的区别,而不是等同于它们的突出性。在这种情况下,听众的行为有所不同:他们使两个峰值的音高更接近,即使第二个峰值的高度仍被调整为低于第一个峰值。这表明听众在判断音调与突出度时使用不同的策略,这反过来意味着调节口语和音乐旋律的感知机制存在差异。
In one condition, there was no baseline declination and the participants’ task was to adjust the height of the second pitch movement so that it had the same prominence as the first movement. Terken found listeners made the second peak significantly lower than the first, as if they were influenced by an expectation for F0 declination over the course of the utterance. (That is, a physically smaller movement later in a sentence seemed just as prominent as a larger movement earlier in the sentence, because of an implicit expectation that pitch range is narrower later in an utterance.) Interestingly, Terken also included a condition in which listeners were instructed to equalize the peak pitch of the two movements, rather than to equate their prominence. In this case, the listeners behaved differently: They made the two peaks much closer in pitch, even though the height of the second peak was still adjusted to be below the first peak. This demonstrated that listeners use different strategies when judging pitch versus prominence, which in turn implies differences in the perceptual mechanisms mediating spoken and musical melodies.
第三个区别涉及语言旋律和音乐之间的神经心理学分离。例如,本章后面将详细讨论的音乐性耳聋与旋律产生和感知的严重问题有关。在制作方面,音乐性聋哑人在唱歌时经常会出现轮廓错误(例如,当原始旋律下降时音高上升),这种错误是普通人很少犯的错误,即使是那些没有受过音乐训练的人( Giguère 等人,2005 年;Dalla Bella 等人,2007 年)。在感知方面,当音乐跑调(包括他们自己的歌声)时,他们通常意识不到,并且难以辨别和识别旋律。尽管音乐旋律存在这些严重问题,
A third difference concerns neuropsychological dissociations between melody in speech and music. For example, musical tone-deafness, which is discussed in detail later in this chapter, is associated with severe problems with melodic production and perception. In terms of production, musically tone-deaf individuals often make contour errors when singing (e.g., going up in pitch when the original melody goes down), a kind of error that is rarely made by ordinary individuals, even those with no musical training (Giguère et al., 2005; Dalla Bella et al., 2007). In terms of perception, they are typically unaware when music is off-key (including their own singing), and have difficulty discriminating and recognizing melodies. Despite these severe problems with musical melody, these individuals do not suffer from any obvious problems with speech intonation production or perception (Ayotte et al., 2002).
图 4.1 Terken,1991 年使用的语调轮廓示意图。详见正文。
Figure 4.1 Schematic diagram of intonation contours used by Terken, 1991. See text for details.
鉴于本节中概述的差异,在语音和音乐中的旋律之间找到结构或认知关系的前景似乎很暗淡。事实上,这样的判断还为时过早。下面的第 4.2 节探讨了音乐旋律并确定了与语音的几个有意义的相似之处。第 4.3 节然后检查了关于语言语调的两条现代研究路线,这些研究表明与音乐旋律有重要联系。在第 4.4 节(关于歌曲中的语言和音乐旋律)的短暂插曲之后,本章的最后一节讨论了两个领域,实证研究在这两个领域中展示了口语和音乐旋律的结构和处理之间的有趣联系。
Given the differences outlined in this section, it may seem the prospects for finding structural or cognitive relations between melody in speech and music are dim. In fact, such a judgment is premature. Section 4.2 below explores musical melody and identifies several meaningful parallels to speech. Section 4.3 then examines two modern lines of research on linguistic intonation that suggest significant links to musical melody. After a brief interlude in section 4.4 (on linguistic and musical melody in song), the final section of the chapter discusses two areas in which empirical research is demonstrating interesting connections between the structure and processing of spoken and musical melody.
作为接下来讨论的背景,值得提供一些关于语言语调的基本信息(更多细节在第 4.3 节中给出)语音旋律的主要决定因素是语音的基频(F0),它是声带的振动。说话者通过声带的张力和声带下方的气压(声门下压力)影响 F0:更大的张力或气压导致更快振动和更高的音调,反之亦然。4在一段话中,F0 的平均变化范围约为一个八度(Hudson & Holbrook,1982 年;Eady,1982 年),而女性的平均 F0 比成年男性高出约一个八度。5个
As background for the discussions that follow, it is worth providing some basic information on linguistic intonation (further detail is given in section 4.3) The primary determinant of speech melody is the fundamental frequency (F0) of the voice, which is the basic rate of vibration of the vocal folds. Speakers influence F0 via the tension of their vocal folds and via air pressure beneath the folds (subglottal pressure): Greater tension or air pressure leads to faster vibration and higher pitch, and vice versa.4 The average range of F0 variation over the course of an utterance is about one octave (Hudson & Holbrook, 1982; Eady, 1982), and the average F0 of women is about one octave above that of adult men.5
通过在语音中的结构化使用,语言语调可以传达句法、语用和强调信息,以及信号韵律分组模式 (Cutler et al., 1997)。英语提供了前三个功能的示例,法语提供了第四个的很好示例。从句法上讲,音高有助于消除口语结构的歧义。例如,在“The investigator found the uncle of the business who wanted by police”这样的句子中,关系从句“who wanted by police”有歧义:它可能修饰两个名词之一:“叔叔”或“商人”。听众对句子的句法解释受音调影响:他们倾向于将关系从句解释为修饰带有显着音调重音的名词(Schafer 等,1996,cf. Speer 等人,2003 年)。音高在语用学中的一个例子是在某些英语话语的末尾使用音高上升来将它们标记为问题(例如,“你要去”与“你要去?”)。音高在强调中的作用是众所周知的:结合其他提示(例如持续时间),英语中的音高可用于将重点放在句子中的特定单词上,例如“我想去今晚,而不是明天。” 最后,French 说明了音高在指示韵律分组结构中的作用,它使用音高上升来标记句子中短短语的边界(如在 结合其他提示(例如持续时间),英语中的音高可用于将焦点放在句子中的特定单词上,例如“我想去今晚,而不是明天。” 最后,French 说明了音高在指示韵律分组结构中的作用,它使用音高上升来标记句子中短短语的边界(如在 结合其他提示(例如持续时间),英语中的音高可用于将焦点放在句子中的特定单词上,例如“我想去今晚,而不是明天。” 最后,French 说明了音高在指示韵律分组结构中的作用,它使用音高上升来标记句子中短短语的边界(如在第 4.2.1 节)。
Through its structured use in speech, linguistic intonation can convey syntactic, pragmatic, and emphatic information, as well as signaling prosodic grouping patterns (Cutler et al., 1997). The English language provides examples of the first three of these functions, and French provides a good example of the fourth. Syntactically, pitch can help to disambiguate the structure of spoken utterances. For example, in a sentence such as “The investigator found the uncle of the businessman who was wanted by the police,” there is ambiguity regarding the relative clause “who was wanted by the police”: it may modify one of two nouns: “uncle” or “businessman.” Listeners’ syntactic interpretation of the sentence is influenced by pitch: They tend to interpret the relative clause as modifying the noun that bears a salient pitch accent (Schafer et al., 1996, cf. Speer et al., 2003). An example of pitch’s role in pragmatics is the use of a pitch rise at the end of certain English utterances to mark them as questions (e.g., “You’re going” vs. “You’re going?”). The role of pitch in emphasis is well known: Combined with other cues (such as duration), pitch in English can be used to place focus on a particular word in a sentence, as in “I wanted to go TONIGHT, not tomorrow.” Finally, the role of pitch in signaling prosodic grouping structure is illustrated by French, which uses pitch rises to mark the boundaries of short phrases within sentences (as discussed further in section 4.2.1).
对语调的科学研究通常依赖于句子的 F0 轮廓。图 4.2显示了一个这样的等高线,对于“在这个城市我不推荐拥有一辆大汽车”这句话,正如一位说英国英语的女性所说的那样(参见声音示例 4.1)。
Scientific research on intonation typically relies on the F0 contours of sentences. One such contour is shown in Figure 4.2, for the sentence “Having a big car is not something I would recommend in this city,” as spoken by a female speaker of British English (cf. Sound Example 4.1).
两个世纪前,约书亚·斯蒂尔 (Joshua Steele) 阐明了 F0 在这个图中以及在一般语音中的一个显着特征:“语音的旋律通过幻灯片快速向上或向下移动,其中音调或半音的分级区分无法通过耳朵; 也没有声音。. . 在任何可察觉的时间空间内,始终清楚地停留在任何特定的级别或统一的音调上,除了说话者结束或暂停的最后一个音调之外”(Steele,1779:4)。6语音中 F0 的曲折轨迹与许多乐器产生的离散音高序列形成对比,并提出了一个基本问题:如何将口头旋律与音乐旋律进行比较?
A salient aspect of F0 evident in this figure, and in speech generally, was articulated by Joshua Steele over two centuries ago: “the melody of speech moves rapidly up or down by slides, wherein no graduated distinction of tones or semitones can be measured by the ear; nor does the voice . . . ever dwell distinctly, for any perceptible space of time, on any certain level or uniform tone, except the last tone of which the speaker ends or makes a pause” (Steele, 1779:4).6 The sinuous trajectory of F0 in speech stands in contrast to the sequence of discrete pitches produced by many musical instruments, and raises a fundamental question: How can spoken melodies can be compared to musical ones?
图 4.2一位女性说话者说的英式英语句子。顶部:声学波形,音节边界由细垂直线标记。底部:F0 轮廓。
Figure 4.2 A sentence of British English spoken by a female speaker. Top: Acoustic waveform, with syllable boundaries marked by thin vertical lines. Bottom: F0 contour.
幸运的是,对于语音音乐研究,现代语言学有许多语调分析系统,其中 F0 轮廓被映射到离散音调序列上。这里简要介绍两个这样的系统。它们在 4.3 节中有更详细的讨论,因为它们与语音-音乐比较特别相关。
Fortunately for speech-music research, modern linguistics has a number of systems of intonation analysis in which F0 contours are mapped onto sequences of discrete tones. Two such systems are briefly introduced here. They are discussed in more detail in section 4.3, because of their particular relevance to speech-music comparisons.
第一个这样的系统,基于语调的语音分析,如图 4.3 所示。
The first such system, based on a phonological analysis of intonation, is illustrated in Figure 4.3.
图 4.3图 4.2 中的句子,根据 ToBI 系统,F0 轮廓标记为语调。向上箭头表示 F0 轮廓中与这些音调对应的时间点。在波形中,只有那些带有音调的音节被转录。声调符号:H = 高,L = 低,* = 与重读音节相关的声调,L+H = 双调低-高重音,! = 低调的语气。(下步在 4.3 节中讨论。为简单起见,图中省略了边缘色调。)
Figure 4.3 The sentence of figure 4.2 with the F0 contour marked for intonational tones according to the ToBI system. Upward arrows indicate the time points in the F0 contour that correspond to these tones. In the waveform, only those syllables that bear a tone are transcribed. Notation for tones: H = High, L = low, * = tone associated with a stressed syllable, L+H = bitonal low-high accent, ! = downstepped tone. (Downstep is discussed in section 4.3. For simplicity, edge tones have been omitted in this figure.)
此图显示了图 4.2中的 F0 轮廓根据著名的语音语调理论“自动分段韵律”(AM) 理论,用标示句子中音高重音的音位“音调”进行注释。音调标签取自“音调和中断指数”或 ToBI,约定系统,其起源于 AM 理论(Beckman 等人,2005 年)。正如本章后面详述的那样,该理论假定语调轮廓可以分解为离散的语言声调序列,这些声调与 F0 轮廓中的特定点相关联。位于这些点之间的语调轮廓的其余部分被认为只是插值,因此大部分 F0 轨迹在语言上并不重要。AM 理论通常只使用两个音高级别:高 (H) 和低 (L),尽管它们不对应于固定频率值。反而,
This figure shows the F0 contour in Figure 4.2 annotated with phonological “tones” that mark the pitch accents in the sentence according to a prominent theory of speech intonation, the “autosegmental metrical” (AM) theory. The tone labels are taken from the “tones and break indices,” or ToBI, system of conventions, which has its origins in AM theory (Beckman et al., 2005). As detailed later in this chapter, this theory posits that intonation contours can be decomposed into sequences of discrete linguistic tones, which are associated with specific points in the F0 contour. The remainder of the intonation contour, which lies between these points, is considered a mere interpolation, so that most of the F0 trajectory is not linguistically significant. AM theory typically uses just two pitch levels: high (H) and low (L), though these do not correspond to fixed frequency values. Instead, they represent phonological categories whose precise realization in terms of pitch values depends strongly on context.
图 4.4显示了一个不同的语调分析系统。
A different system of intonation analysis is illustrated in Figure 4.4.
与 AM 模型相比,“程序图”模型(Mertens,2004a,2004b)的根源在于语音中音高感知的心理声学研究(详见4.3.2 节)。它旨在描述人类听众感知到的句子的音高模式。prosogram 最显着的特征是它将连续的 F0 轮廓解析为一系列音节音调,其中许多音调具有固定音高。例如,在图 4.4中,除了一个音节(“car”)之外的所有音节都被分配了平调。虽然这使得程序图看起来有点像音乐旋律的“钢琴卷轴符号”(参见图 3.1),但重要的是要注意,程序图的水平音调(以及它们之间的音高间隔)并不遵循任何音阶。(看图 4.4标题说明了程序图上的y轴单位,本章的附录提供了将这些单位与以 Hz 为单位的值相关联的公式。)
In contrast to the AM model, the “prosogram” model (Mertens, 2004a, 2004b) has its roots in psychoacoustic research on pitch perception in speech (as detailed in section 4.3.2). It aims to depict the pitch pattern of a sentence as it is perceived by human listeners. The most notable feature of the prosogram is that it parses the continuous F0 contour into a sequence of syllable tones, many of which have a fixed pitch. In Figure 4.4, for example, all but one syllable (“car”) have been assigned level tones. Although this makes the prosogram look a bit like the “piano roll notation” of a musical melody (cf. Figure 3.1), it is important to note that the level tones of a prosogram (and the pitch intervals between them) do not adhere to any musical scale. (See Figure 4.4 caption for an explanation of the y-axis units on the prosogram, and this chapter’s appendix for formulae relating these units to values in Hz.)
图 4.4程序图说明,使用图 4.2中的句子。(A) 显示原始 F0 轮廓,(B) 显示程序图。在两个图表中,垂直轴显示相对于 1 Hz 的半音(因此 90 st 对应于 181 Hz;参见本章的附录),音乐 C3 和 C4 被标记为参考。在 (B) 中,根据每个元音的 F0 计算音节声调。(用于计算音节声调的分析单元的选择在第 4.3.2 节中讨论。)每个元音的时间起始和偏移由 (A) 和 (B) 中的垂直虚线表示。在句子上方的音节转录中,元音已被加下划线,并且箭头将每个元音连接到 F0 轮廓中的相应区域。
Figure 4.4 Illustration of the prosogram, using the sentence of Figure 4.2. (A) shows the original F0 contour, and (B) shows the prosogram. In both graphs the vertical axis shows semitones relative to 1 Hz (thus 90 st corresponds to 181 Hz; cf. this chapter’s appendix), and musical C3 and C4 are marked for reference. In (B), syllable tones were computed based on the F0 of each vowel. (The choice of the unit of analysis for computing syllable tones is discussed in section 4.3.2.) The temporal onset and offset of each vowel is indicated by vertical dashed lines in (A) and (B). In the syllable transcription above the sentence, vowels have been underlined, and an arrow connects each vowel to its corresponding region in the F0 contour.
与 AM 的语调方法不同,程序图模型为每个音节分配一个音调,因此可能看起来与 AM 方法不一致。然而,这些模型并不矛盾。回想一下 AM 理论中的声调标记反映了听者的语言解释系统只关心语调轮廓中的几个点的想法,因为这些点定义了相关的语言对比。也就是说,AM 理论专注于音位学。相比之下,音程图模型关注的是指定听众在感知语调时听到的所有音调。因此程序图模型侧重于语音学。这些模型并不矛盾,因为即使听者的认知系统只关心句子中的几个音高,很明显听众听到的不仅仅是这些音调,因为大部分语音持续时间都伴随着 F0 轮廓。
Unlike the AM approach to intonation, the prosogram model assigns a tone to each syllable, and thus may seem at odds with the AM approach. However, the models are not contradictory. Recall that tone labeling in AM theory reflects the idea that the listener’s language interpretation system cares only about a few points in the intonation contour, because these define the relevant linguistic contrasts. That is, AM theory is focused on phonology. The prosogram model, in contrast, is concerned with specifying all the pitches that a listener hears when perceiving intonation. Thus the prosogram model is focused on phonetics. The models are not contradictory because even if a listener’s cognitive system cares only about a few pitches in a sentence, it is clear that the listener hears more than just these pitches, because most of the duration of a spoken utterance is accompanied by an F0 contour.
虽然需要进一步的工作来整合 AM 和程序图方法(参见 Hermes,2006),但这个问题不会在本章中占据我们的位置。就目前的目的而言,相关的一点是有动机良好的语调分析系统使用离散音调而不是连续的 F0 轮廓。这允许在语音和音乐之间进行许多理论和经验比较(参见第4.3.1和4.5.1节)。然而,对于那些对这两种语调方法都持怀疑态度的人来说,值得注意的是,本章的大部分内容并不依赖于对 AM 或音程图理论的承诺。
Although further work is needed to integrate the AM and prosogram approaches (cf. Hermes, 2006), this issue will not occupy us in this chapter. For the current purposes, the relevant point is that there are well-motivated systems of intonation analysis that employ discrete tones rather than continuous F0 contours. This allows a number of theoretical and empirical comparisons to be drawn between speech and music (cf. sections 4.3.1 and 4.5.1). For those who are skeptical of both of these approaches to intonation, however, it is worth noting that a good deal of the chapter does not depend on a commitment to either AM or prosogram theory.
正如引言中所指出的,音乐旋律不仅仅是一连串的音高:它是大脑中相互关联的模式网络来自一系列音高变化。也就是说,人类感知系统将二维序列(音高与时间)转换为一组丰富的感知关系。欣赏这种丰富性的一种方法是列出这些关系。本节提供了一个包含九个项目的列表,并根据与语音语调感知的比较来考虑每个项目。该列表并不详尽,但涵盖了音乐旋律中的许多基本关系。重点是熟悉西欧调性音乐的听众对简单旋律的感知,因为迄今为止,此类旋律得到了最多的实证研究。为了说明的目的,我不时依赖于这个传统的特定音乐旋律,第3章;比照。图 3.1和声音示例 3.1)。
As pointed out in the introduction, a musical melody is more than a mere succession of pitches: It is a network of interconnected patterns in the mind derived from a sequence of pitch variation. That is, the human perceptual system converts a two-dimensional sequence (pitch vs. time) into a rich set of perceived relationships. One way to appreciate this richness is to list these relationships. This section provides one such list with nine items, and considers each item in terms of comparisons to speech intonation perception. The list is not exhaustive, but it covers many of the fundamental relations in musical melodies. The focus is on the perception of a simple melody by a listener familiar with Western European tonal music, because such melodies have received the greatest amount of empirical research to date. For illustrative purposes, I rely from time to time on a particular musical melody from this tradition, a generic melody chosen because it represents certain general features of melodic structure (the melody K0016, introduced in Chapter 3; cf. Figure 3.1 and Sound Example 3.1).
分组指的是将音调在感知上聚集成比单个音调大但比整个旋律小的块,如第 3 章所述。例如,在听K0016的33个音时,可以明显感觉到音被分为五个乐句(见图3.4)。前两个乐句边界由静音(音乐休止符)划分,但乐句 3 和 4 的边界在序列中没有任何物理不连续性。相反,与同一短语中的其他音调相比,这些边界之前是长和/或低的音调。虽然图 3.4中标记的五个短语构成 K0016 中感知最明显的分组级别,分组结构理论假定这些短语只是层次分组结构中的一层,较低级别的组嵌入较高级别的组中(例如,Lerdahl & Jackendoff,1983;参见第 3 章,第 3.2.3 节)。
Grouping refers to the perceptual clustering of tones into chunks larger than single tones but smaller than the entire melody, as discussed in Chapter 3. For example, in listening to the 33 tones of K0016, there is a clear sense that the tones are grouped into five phrases (cf. Figure 3.4). The first two phrase boundaries are demarcated by silences (musical rests), but the boundaries of Phrases 3 and 4 are not marked by any physical discontinuity in the sequence. Instead, these boundaries are preceded by tones that are long and/or low compared to other tones in the same phrase. Although the five phrases marked in Figure 3.4 constitute the most perceptually obvious grouping level in K0016, theories of grouping structure posit that these phrases are but one layer in a hierarchical grouping structure, with lower level groups embedded in higher level ones (e.g., Lerdahl & Jackendoff, 1983; cf. Chapter 3, section 3.2.3).
正如第 3 章所讨论的, 分组在韵律研究中起着突出的作用。语音中的分组和语调有什么关系?对韵律边界感知的研究表明,突出的音高事件可以作为语音中的分组线索 (de Pijper & Sanderman, 1994),就像它们在音乐旋律中一样 (Lerdahl & Jackendoff, 1983)。此外,一些现代语调理论认为,分组结构与话语中的音高模式有着密切的关系。例如,自动音段韵律 (AM) 理论假定英语句子可以有两个级别的语调分组:“中间短语”(缩写为“ip”)和“语调短语”(缩写为“IP”;Beckman & Pierrehumbert, 1986 ), 两者都由称为“边缘音”的语调轮廓中的音高事件标记。图 4.5(参见声音示例 4.2)。
As discussed in Chapter 3, grouping plays a prominent role in the study of prosody. What is the relationship between grouping and intonation in speech? Research on the perception of prosodic boundaries reveals that salient pitch events can serve as grouping cues in speech (de Pijper & Sanderman, 1994), just as they can in musical melodies (Lerdahl & Jackendoff, 1983). Furthermore, several modern theories of intonation posit that grouping structure has a close relationship with pitch patterning in an utterance. For example, autosegmental-metrical (AM) theory posits that English sentences can have two levels of intonational grouping: the “intermediate phrase” (abbreviated “ip”) and the “intonational phrase” (abbreviated “IP”; Beckman & Pierrehumbert, 1986), both of which are marked by pitch events in the intonation contour called “edge tones.” (In the case of the ip, the edge tone is called a “phrase tone” or “phrase accent,” and in the case of the IP, it is called a “boundary tone.”) This is illustrated in Figure 4.5 (cf. Sound Example 4.2).
图 4.5显示了一句英式英语的波形和 F0 等高线:“In this famous coffee shop you will eat the best donuts in town.” 根据AM对句子的分析,这句话有两个中间短语。第一个 ip 以“shop”结尾,其右边界以音调降低为标志,被认为反映了低短语重音的存在,由 L- 表示(参见第 4.3.1 节)。下一个中间短语在句子末尾终止。此点也以音调下降为标志,被认为反映了标记第二个 ip 结尾的低短语重音和完整 IP 结尾处的低音边界音(分别为 L- 和 L%)的综合效果。
Figure 4.5 shows the waveform and F0 contour of a sentence of British English: “In this famous coffee shop you will eat the best donuts in town.” According to AM analysis of the sentence, this sentence has two intermediate phrases. The first ip ends with “shop,” and its right boundary is marked by a lowering in pitch, thought to reflect the presence of a low phrase accent, indicated by L- (cf. section 4.3.1). The next intermediate phrase terminates at the end of the sentence. This point is also marked by a pitch fall, thought to reflect the combined effect of the low phrase accent marking the end of the second ip and the low boundary tone at the end of the full IP (L- and L%, respectively).
图 4.5一位女性说话者说的英式英语句子。顶部:带有由细垂直线标记的音节边界的波形。底部:具有 ToBI 音调的 F0 轮廓,如图 4.3 所示。括号表示根据句子的 AM 分析的中间短语。L-表示每个中间词组末尾的低边调(词组重音),L%表示全语调词组末尾的分界调。
Figure 4.5 A sentence of British English spoken by a female speaker. Top: Waveform with syllable boundaries marked by thin vertical lines. Bottom: F0 contour with ToBI tones, as in Figure 4.3. Brackets indicate intermediate phrases according to an AM analysis of the sentence. L- indicates a low edge tone (phrase accent) at the end of each intermediate phrase, and L% indicates a boundary tone at the end of the full intonational phrase.
众所周知,降低音高是语音和音乐中相位边界的显着提示(例如,Jusczyk & Krumhansl,1993)。然而,重要的是要注意,语音中的短语边界也可以通过音调上升来标记。以法语为例,几种法语语调理论假设法语句子中的单词被分组为短韵律短语(例如,Jun & Fougeron, 2000, 2002 的“accentual phrases”)。值得注意的是,如果这些短语不是句子的最后一个短语,那么它们以升调结束是很常见的(参见 Delattre,1963 年;Di Cristo,1998 年)。如图 4.6所示, 它显示了一个法语句子及其 F0 轮廓(“La femme du pharmacien va bientôt sortir fair son marché”;参见声音示例 4.3)。如图所示,句子分为四个重音短语:(La femme) (du pharmacien) (va bientôt sortir) (fair son marché)。
Pitch lowering is known to be a salient cue to phase boundaries in speech and music (e.g., Jusczyk & Krumhansl, 1993). It is important to note, however, that phrase boundaries in speech can also be marked by pitch rises. To take an example from French, several theories of French intonation posit that words in French sentences are grouped into short prosodic phrases (e.g., the “accentual phrases” of Jun & Fougeron, 2000, 2002). Notably, it is quite common for such phrases to end with a rising pitch movement if they are not the final phrase in the sentence (cf. Delattre, 1963; Di Cristo, 1998). This is illustrated in Figure 4.6, which shows a sentence of French together with its F0 contour (“La femme du pharmacien va bientôt sortir fair son marché”; cf. Sound Example 4.3). As shown in the figure, the sentence is divided into four accentual phrases: (La femme) (du pharmacien) (va bientôt sortir) (fair son marché).
除了最后一个重音短语之外,所有的右边界都以上升的音高运动为标志。这种音调上升与韵律分组边界的关联是法语口语旋律特征的一部分。这与英语形成鲜明对比,在英语中,中间短语没有“高音结束”的倾向,通常以低音结束(Grover 等人,1987)。
The right boundaries of all but the final accentual phrase are marked by a rising pitch movement. This association of pitch rises with prosodic grouping boundaries is part of the characteristic melody of spoken French. This stands in contrast to English, in which intermediate phrases do not have a bias to “end high” and often end on a low pitch (Grover et al., 1987).
这个例子说明了关于语调的一个重点。使一种语言的语调不同于另一种语言的部分原因是突出的音高运动与句子的分组结构对齐的方式。Gunnar Fant 及其同事的工作对这种差异的感知重要性进行了非正式但令人信服的证明。这些科学家正在使用语音合成来探索不同语言的语调模式。这种方法允许他们将一种语言的语调模式移植到另一种语言的单词上。这在声音示例 4.4a、b 中有说明,其中提供了英语句子的两个版本(“沿着三堵墙,有一个舞台,舞台是用稻草覆盖的粗糙木板”)。第一版有英文单词和英文声调,而第二个版本有英语单词但有法语语调,导致英语口音明显带有法国口音的感知印象。特别要注意的是,在法语口音版本中以“walls”和“boards”结尾的短语如何以向上的音高运动结束,而在原始英语版本中以向下的音高运动结束。
This example illustrates an important point about intonation. Part of what makes one language’s intonation different from another is the way in which salient pitch movements align with the grouping structure of a sentence. An informal but compelling demonstration of the perceptual importance of this difference comes from the work of Gunnar Fant and colleagues. These scientists are using speech synthesis to explore intonation patterns in different languages. This approach permits them to graft the intonational patterns of one language onto the words of another. This is illustrated in Sound Examples 4.4a, b, which present two versions of an English sentence (“Along three of the walls there was a stage of rough wooden boards covered with straw”). The first version has English words and English intonation, whereas the second version has English words but French intonation, resulting in the perceptual impression of English spoken with a decidedly French accent. In particular, note how the phrases that end with “walls” and “boards” in the French accent version end on an upward pitch movement, but on a downward pitch movement in the original English version.
因此,旋律与编组结构之间的关系似乎是比较语音音乐研究的一个有前途的领域。例如,可以研究法语和英语音乐旋律中分组边界和突出音高峰值之间的关系。与英语音乐相比,法国音乐中的音高峰值是否更接近乐句结尾?答案是否取决于是否分析了歌曲(其中的旋律与文字相伴)或器乐?在进行这样的研究时,当然必须决定如何识别音乐旋律中的乐句边界;理想情况下,这将涉及在分段实验中从听众群体收集的感知数据(参见 Deliège,1987 年;Frankland & Cohen,2004 年)
Thus it would appear that the relationship between melody and grouping structure is a promising area for comparative speech-music research. For example, one could study the relationship between grouping boundaries and salient pitch peaks in French versus English musical melodies. Are pitch peaks likely to occur closer to the end of phrases in French music than in English music? Does the answer depend on whether songs (in which melodies are written to go with words) or instrumental music is analyzed? In conducting such a study, one must of course decide how to identify phrase boundaries in musical melodies; ideally this would involve perceptual data collected from groups of listeners in segmentation experiments (cf. Deliège, 1987; Frankland & Cohen, 2004)
图 4.6一位女性说话者说的法语句子。顶部:带有由细垂直线标记的音节边界的波形。下图:根据 AM 风格的法语语调模型(Jun 和 Fougeron,2002 年),带有音韵的 F0 轮廓。色调对齐点如图 4.3 所示。括号表示重音短语 (AP)。每个 AP 的一个或多个初始音调(L 或 L+Hi)是短语音调,而最终音调(H* 或 L+H*)是音高重音。L%表示语调短语末尾的分界调。
Figure 4.6 A sentence of French spoken by a female speaker. Top: Waveform with syllable boundaries marked by thin vertical lines. Bottom: F0 contour with phonological tones, according to an AM-style model of French intonation (Jun & Fougeron, 2002). Tone alignment points are marked as in Figure 4.3. Parentheses indicate accentual phrases (APs). The initial tone or tones of each AP (L or L+Hi) are phrase tones, whereas the final tones (H* or L+H*) are pitch accent tones. L% indicates the boundary tone at the end of the intonational phrase.
也可以问相反的问题:母语中口语短语的音调模式是否会影响音乐中的分组倾向?例如,如果要求讲法语和英语的人指出音乐旋律中的乐句边界,讲法语的人是否会比讲英语的人更倾向于在音调上升后听到边界?如果是这样,这表明语言音调模式的经验会影响音乐旋律的解析。
One can also ask the inverse question: Do the pitch patterns of spoken phrases in one’s native language influence grouping predispositions in music? For example, if French and English speakers were asked to indicate phrase boundaries in musical melodies, would French speakers show a greater tendency to hear boundaries after pitch rises than English speakers? If so, this would suggest that experience with linguistic pitch patterns influences the parsing of musical melodies.
许多音乐旋律的一个显着感知特征是稳定的节拍,可以通过敲击使其同步(请参阅第 3章3.2.1 节的扩展讨论和使用 K0016 的示例)。大多数关于旋律节拍感知的研究都集中在音调序列的时间结构上,假设旋律的音高模式对节拍感影响很小。然而,实证研究表明,音高模式可以对节拍感知做出适度的贡献,例如,通过旋律轮廓峰或谷产生的重音(Hannon 等人,2004 年)。
A salient perceptual feature of many musical melodies is a steady beat to which one can synchronize by tapping (see Chapter 3, section 3.2.1 for an extended discussion and for examples using K0016). Most research on beat perception in melodies has focused on the temporal structure of tone sequences, under the assumption that the pitch pattern of a melody contributes little to the sense of a beat. However, empirical research indicates that pitch patterns can make a modest contribution to beat perception, for example, via accents created by melodic contour peaks or valleys (Hannon et al., 2004).
正如第 3 章所讨论的, 没有证据表明语音具有规律的节拍,或具有多重周期性意义上的韵律。然而,如果从常规时间中抽象出来,有趣的是注意到音高变化通过它在使某些事件更加突出或重音方面所起的作用,对两个领域的节奏都有贡献。例如,轮廓峰是音乐旋律中的感知重音点 (Thomassen, 1983; Huron & Royal, 1996),它可以通过加强或抵消基础节拍来参与节奏(例如,在 K0016 中,第四乐句中的轮廓峰处于非节拍位置,在该短语中提供切分感;参见声音示例 3.1)。在英语的语调中,突出的音调重音出现在话语中重读音节的子集上,从而在重读音节的模式之上标记出第二层时间结构(Jun,2005;Ladd,1996:Ch.2)。语言学家博林格认为,这些口音的时间模式是话语节奏的一部分(参见第 3 章,第 3.3.1 节,“音韵学和类型学”小节)。
As discussed in Chapter 3, there is no evidence that speech has a regular beat, or has meter in the sense of multiple periodicities. If one abstracts away from regular timing, however, it is interesting to note that pitch variation contributes to rhythm in both domains via the role it plays in making certain events more prominent or accented. For example, contour peaks are perceptually accented points in musical melodies (Thomassen, 1983; Huron & Royal, 1996), which can participate in rhythm by either reinforcing or contradicting the underlying beat (for example, in K0016 the contour peak in the fourth phrase is in a nonbeat position, providing a sense of syncopation in this phrase; cf. Sound Example 3.1). In the intonation of English, salient pitch accents occur on a subset of the stressed syllables in an utterance, thus marking out a second layer of temporal structure over and above the patterning of stressed syllables (Jun, 2005; Ladd, 1996: Ch. 2). The linguist Bolinger has argued that the temporal patterning of these accents is part of the rhythm of an utterance (cf. Ch. 3, section 3.3.1, subsection “Phonology and Typology”).
无论精确的音程大小如何,旋律中起伏的连续模式定义了它的音高轮廓。(因此,音高轮廓可以由简单地代表“向上”、“向下”或“相同”的一系列符号来指定,而不是通过指定音高之间间隔的方向和大小的数字序列来指定。)音高轮廓及其时间模式定义了旋律轮廓。Dowling 等人 (Dowling, 1978; Dowling et al., 1995) 通过一系列实验表明,旋律轮廓在不熟悉的旋律的即时记忆中起着重要作用。这已经通过表明当听众被要求说出一段旋律是否是之前听过的小说旋律的精确换位来证明的,他们经常对旋律“诱饵”给出错误的肯定反应,这些旋律与原始轮廓具有相同的轮廓但不是其精确的间隔。随着两个旋律之间的时间间隔越来越长,误报的趋势下降,并且从相同的轮廓诱饵中辨别精确的换位得到改善,这表明新旋律的心理表征在音高间隔类别方面逐渐得到巩固(参见第 2 章,第 2.2.3 节,“音高间隔和旋律感知”小节)。这一发现与成人在精确的音程大小方面代表熟悉的旋律的观点是一致的,成人在检测此类旋律的轮廓违反与轮廓保持变化方面具有相同的准确性支持这一观点(Trehub 等人,1985 年)。
The sequential patterns of ups and downs in a melody, regardless of precise interval size, define its pitch contour. (A pitch contour can therefore be specified by a sequence of symbols that simply stand for “up,” “down,” or “same,” rather than a sequence of numbers specifying the direction and size of intervals between pitches.) A pitch contour and its temporal pattern define a melodic contour. Dowling and others (Dowling, 1978; Dowling et al., 1995) have shown over a long series of experiments that melodic contour plays an important role in immediate memory for unfamiliar melodies. This has been demonstrated by showing that when listeners are asked to say if a melody is an exact transposition of a previously heard novel melody, they often give false positive responses to melodic “lures” that share the same contour as the original but not its precise intervals. As more time is given between the two melodies, the tendency for false positives declines and discrimination of exact transpositions from same contour lures improves, suggesting that the mental representation of a novel melody is gradually consolidated in terms of pitch interval categories (cf. Chapter 2, section 2.2.3, subsection “Pitch Intervals and Melody Perception”). This finding is consistent with the view that adults represent familiar melodies in terms of precise interval sizes, a view supported by adults’ equal accuracy at detecting contour-violating versus contour-preserving changes to such melodies (Trehub et al., 1985).
这种旋律轮廓的观点意味着基于轮廓的旋律表示是“粗粒度”模式,缺乏基于音程的表示细节。因此,旋律轮廓是婴儿首先要区分的音乐方面之一是有道理的,他们还没有发展出他们文化中基于音程的音调模式(Trehub 等人,1984 年,1987 年;Ferland & Mendelson,1989 年) ;Trainor 和 Trehub,1992 年;Trehub、Schellenberg 和 Hill,1997 年)。五岁儿童似乎也严重依赖旋律感知中的轮廓,因为他们在检测不熟悉的音调序列的轮廓保持变化方面比成人差(Schellenberg & Trehub,1999)。总的来说,诸如此类的发展研究表明,基于轮廓的处理是一种音调序列的默认处理,第 2 章)。当然,这并不是说轮廓在有经验的听众的音乐感知中不再发挥重要作用。例如,研究人员表明,旋律轮廓方向变化较少的旋律被认为比此类变化较多的旋律更简单 (Boltz & Jones, 1986; Cuddy et al., 1981),即使熟悉的旋律即使只要轮廓保持不变,间隔模式就会有些扭曲(Dowling & Harwood,1986)。此外,神经影像学研究表明,旋律感知涉及轮廓和音程处理的动态整合(Patel & Balaban,2000;Patel,2003a)。
This view of melodic contour implies that contour-based representations of melodies are “coarse-grained” schema that lack the detail of representations based on musical intervals. Thus it makes sense that melodic contour is one of the first aspects of music to be discriminated by infants, who have not yet developed the interval-based tonal schema of their culture (Trehub et al., 1984, 1987; Ferland & Mendelson, 1989; Trainor & Trehub, 1992; Trehub, Schellenberg, & Hill, 1997). Five-year-old children also seem to rely heavily on contour in melody perception, as they are worse than adults at detecting contour-preserving changes to unfamiliar tone sequences (Schellenberg & Trehub, 1999). Overall, developmental studies such as these suggest that contour-based processing is a kind of default processing of tone sequences, which gradually yields in importance to more detailed processing based on the learned sound categories of musical intervals (cf. Chapter 2). Of course, this is not to say that contour ceases to play a significant role in music perception for experienced listeners. For example, researchers have shown that melodies with fewer changes in melodic contour direction are perceived as simpler than melodies with more such changes (Boltz & Jones, 1986; Cuddy et al., 1981), and that familiar melodies can be recognized even if the interval pattern is somewhat distorted, as long as contour is preserved (Dowling & Harwood, 1986). Furthermore, neuroimaging research suggests that melody perception involves the dynamic integration of contour and interval processing (Patel & Balaban, 2000; Patel, 2003a).
基于轮廓的处理几乎肯定起源于语调感知。从生命的早期开始,许多文化中的前语言婴儿都会接触到一种特殊的语音记录,称为婴儿定向语音或“motherese”(Fernald,1985 年;Fernald 等人,1989 年)。面向婴儿的言语的一个特点是使用夸张和独特的语调轮廓来唤醒或安抚婴儿,并传达赞同、反对等信息(Fernald & Kuhl,1987)。因此,婴儿从小就对音高轮廓线索敏感是适应性的,因为这些轮廓在情感交流中起着功能性作用。即使婴儿没有接触到母语,正如某些文化所声称的那样(Ochs & Schieffelin,1984),正常的语言习得不可避免地涉及学习语调模式的结构等价性,这些语调模式在轮廓上相似但在精确的音高运动方面并不相同。例如,如果两个不同的演讲者说“我说到 BANK 前面去”,使用音高上升来标记作品 BANK 具有对比性,他们可能会使用不同大小的音高移动。尽管如此,这句话具有相同的语用意义,听者必须学会掌握这种等价性,这种等价性在于音高轮廓的相似性尽管绝对音高模式存在差异。因此,我们可以期望所有正常的语言听众都能获得音调轮廓感知能力。
Contour-based processing almost certainly has its origins in intonation perception. Starting early in life, prelinguistic infants in many cultures are exposed to a special speech register known as infant-directed speech, or “motherese” (Fernald, 1985; Fernald et al., 1989). One characteristic of infant-directed speech is the use of exaggerated and distinctive intonation contours to arouse or soothe infants and convey approval, disapproval, and so forth (Fernald & Kuhl, 1987). It is thus adaptive for infants to be sensitive to pitch contour cues from an early age, as these contours play a functional role in emotional communication. Even if infants are not exposed to motherese, as is claimed for some cultures (Ochs & Schieffelin, 1984), normal language acquisition inevitably involves learning the structural equivalence of intonation patterns that are similar in contour but not identical in terms of exact pitch movements. For example, if two different speakers utter “Go in front of the BANK, I said,” using a pitch rise to mark the work BANK as contrastive, they might use different size pitch movements. Nevertheless, the sentence has the same pragmatic meaning, and a listener must learn to grasp this equivalence, which lies in the similarity in pitch contour despite differences in absolute pitch patterns. Thus one can expect all normal language listeners to arrive at competence with pitch contour perception.
上述考虑表明,旋律轮廓的感知可能是比较语音音乐研究的一个富有成果的领域。事实上,事实证明确实如此。4.5.2 节探讨了两个领域中轮廓处理之间的认知和神经关系。
The considerations outlined above suggest that the perception of melodic contour could be a fruitful area for comparative speech-music research. In fact, this is proving to be the case. Section 4.5.2 explores the cognitive and neural relations between contour processing in the two domains.
当一段音乐旋律在听起来不完整的地方停止时,即使旋律不熟悉,听众通常也会期待下一个音符是什么。这可以通过让听众评价不同音调延续旋律的程度来证明(Krumhansl 等人,2000 年),或者让他们通过唱歌来延续旋律(Carlsen,1981 年)。在这两种情况下,反应都远非随机,并且被认为反映了特定文化知识和听觉处理的普遍格式塔原则的结合(Krumhansl 等人,1999 年,2000 年)。在与语音语调的比较方面,后者的原理很有趣。
When a musical melody is stopped at a point at which it sounds incomplete, listeners typically have expectations for what the next note will be, even when the melody is unfamiliar. This can be demonstrated by having listeners rate how well different tones continue a melody (Krumhansl et al., 2000), or by having them continue the melody by singing (Carlsen, 1981). In both cases, responses are far from random, and are thought to reflect a combination of culture-specific knowledge and universal Gestalt principles of auditory processing (Krumhansl et al., 1999, 2000). It is the latter principles that are of interest with regard to comparisons with speech intonation.
Narmour (1990) 的“隐含实现”理论是关于格式塔原则的最著名的支配音符音乐预期的理论,它激发了大量由 Krumhansl 开创的实证研究 (1991, 1995a, 1995b; cf . Cuddy & Lunney, 1995). 该模型提出了五个先天原则,这些原则在给定现有音程的情况下控制对后续音符的期望(因此称为“音程暗示”)(Schellenberg,1996)。Schellenberg (1997) 确定了这些原则之间的各种冗余,并提出了一个简化的双因素模型,在此讨论(参见 Schellenberg 等人,2002 年;Krumhansl 等人,2000 年)。第一个因素被称为“音调接近度”:听众期望后续音调的音调接近他们听到的最后一个音调。第二个因素被称为“音高反转”,实际上描述了两种趋势。第一个是期望在一个大的音程之后,接下来的音调会反转旋律的方向。第二个是预期任何类型的反转都会产生对称模式,以便即将到来的音调在频率上(在 2 个半音内)接近序列的倒数第二个音调(例如,ACA 或 AEB)。因此,音调反转需要考虑听众听到的前两个音调。(下面对音高反转的讨论主要集中在反转本身而不是对称性上。)第二个是预期任何类型的反转都会产生对称模式,以便即将到来的音调在频率上(在 2 个半音内)接近序列的倒数第二个音调(例如,ACA 或 AEB)。因此,音调反转需要考虑听众听到的前两个音调。(下面对音高反转的讨论主要集中在反转本身而不是对称性上。)第二个是预期任何类型的反转都会产生对称模式,以便即将到来的音调在频率上(在 2 个半音内)接近序列的倒数第二个音调(例如,ACA 或 AEB)。因此,音调反转需要考虑听众听到的前两个音调。(下面对音高反转的讨论主要集中在反转本身而不是对称性上。)
The most prominent theory of the Gestalt principles governing note-to-note musical expectancies is the “implication-realization” theory of Narmour (1990), which has inspired a good deal of empirical research pioneered by Krumhansl (1991, 1995a, 1995b; cf. Cuddy & Lunney, 1995). This model proposes five innate principles governing the expectancy for a following note given an existing musical interval (hence the term “intervallic implication”) (Schellenberg, 1996). Schellenberg (1997) identified various redundancies between these principles and proposed a simplified two-factor model that is discussed here (cf. Schellenberg et al., 2002; Krumhansl et al., 2000). The first factor is referred to as “pitch proximity”: Listeners expect a subsequent tone to be close in pitch to the last pitch they heard. The second factor is referred to as “pitch reversal,” and actually describes two tendencies. The first is an expectation that after a large interval, the following tone will reverse the direction of the melody. The second is an expectation that reversals of any kind will create a symmetric pattern so that the upcoming tone will be close in frequency (within 2 semitones) to the penultimate tone of the sequence (for example, A-C-A, or A-E-B). Thus pitch reversal requires consideration of the two previous tones heard by a listener. (The discussion of pitch reversal below focuses on reversal per se rather than symmetry.)
谢伦伯格等人。(2002) 在儿童和成人对旋律的不同可能延续进行评分的实验中测试了接近和反转原则。作者发现,音高接近度同样可以很好地预测各个年龄组的预期,但音高反转仅在成人组中是一个很好的预测器。这种发育差异表明音高反转并不是听觉处理的硬连线原则,从而引发了该原则从何而来的问题。可能来自隐式学习音乐旋律中的模式?音高跳跃后的反转趋势实际上在音乐旋律中很普遍,可能反映了一个简单的事实,即音高跳跃倾向于接近旋律范围的边缘,迫使随后的移动向相反的方向移动(von Hippel & Huron,2000) . 然而,如果这是音乐的普遍特征,为什么它对儿童的音乐期望没有影响呢?
Schellenberg et al. (2002) tested the proximity and reversal principles in experiments in which children and adults rated different possible continuations of melodies. The authors found that pitch proximity was an equally good predictor of expectancy across age groups, but that pitch reversal was a good predictor only in the adult group. This developmental difference shows that pitch reversal is not a hardwired principle of auditory processing, raising the question of where the principle comes from. Might it come from implicit learning of patterns in musical melodies? The tendency for reversals to follow pitch jumps is in fact widespread in musical melody, likely reflecting the simple fact that pitch jumps tend to approach the edge of a melody’s range, forcing a subsequent move in the opposite direction (von Hippel & Huron, 2000). However, if this is a widespread feature of music, why doesn’t it play a role in children’s musical expectancies?
谢伦伯格等人。表明儿童对音高反转缺乏预期可能是由于儿童接触了婴儿定向语音(或具有类似韵律特性的儿童定向语音)。具体来说,他们认为在婴儿定向言语中,大音高运动之后通常会在同一方向上进一步运动,而不是逆转。因此,语音的影响可能会消除音乐对音高反转预期发展的影响。尽管是推测性的,但该提议以提出这样的观点而著称,即对音乐旋律中音高模式的期望不仅取决于音乐经验,还取决于语音音高模式的经验。这个想法的直接证据需要在一个共同的框架中量化口语和音乐旋律的统计规律,本章4.5.1节。
Schellenberg et al. suggest that children’s lack of expectancy for pitch reversal may result from the exposure of children to infant-directed speech (or child-directed speech with similar prosodic properties). Specifically, they suggest that in infant directed speech large pitch movements are often followed by further movement in the same direction, rather than reversals. Thus the effect of speech might wash out the effect of music on the development of expectancies for pitch reversals. Although speculative, this proposal is notable for advancing the idea that expectations for pitch patterns in musical melodies are shaped not only by experience with music, but also by experience with the pitch patterns of speech. Direct evidence for the idea would require quantifying the statistical regularities of spoken and musical melodies in a common framework, which could be done using methods that will be discussed in section 4.5.1 of this chapter.
音乐中旋律感知的一个基本方面是识别旋律不同部分之间的动机相似性。例如,K0016 的第一个和最后一个乐句是相同的(参见声音示例 3.1),这有助于为这段旋律提供一种结束感。动机相似性不必如此字面意思:人们可以在没有身份的情况下识别相似性。由于相似度是一个受多种因素影响的分级特征,从表面声学特征(如音色和发音)到中层结构特征(如旋律轮廓)再到抽象特征(如隐含和声),其实证研究提出了一个具有挑战性的问题(Schmuckler, 1999; McAdams & Matzkin, 2001),并且这方面的进展一直很缓慢。
A fundamental aspect of melody perception in music is the recognition of motivic similarity between different parts of a melody. For example, the first and last phrase of K0016 are identical (cf. Sound Example 3.1), which helps to provide this melody with a sense of closure. Motivic similarity need not be so literal: People can recognize similarity without identity. Because similarity is a graded feature influenced by many factors, ranging from surface acoustic features (such as timbre and articulation) to midlevel structural features (such as melodic contour) to abstract features (such as implied harmony), its empirical study presents a challenging problem (Schmuckler, 1999; McAdams & Matzkin, 2001), and progress in this area has been slow.
支配音乐中动机相似性感知的原则在多大程度上借鉴了语调中相似性感知的机制?要回答这个问题,首先需要清楚地了解语音语调相似性判断的依据。有一些关于这个主题的研究,基于一种语言只有有限数量的语言上独特的语调轮廓的想法。例如,在评论英式英语的语调时,Halliday (1970:6) 评论道:
To what extent do the principles that govern the perception of motivic similarity in music draw on mechanisms of similarity perception in intonation? To answer this question one first needs a clear understanding of the basis for intonational similarity judgments in speech. There is some research on this topic, based on the idea that a language has only a limited number of linguistically distinctive intonation contours. For example, commenting on the intonation of British English, Halliday (1970:6) remarked:
理论上可以产生的不同音调轮廓的数量没有限制。. . . 但不是所有的音高变化都是演讲者使用的。. . 很重要。可以将非常大的一组可能的音调轮廓视为分组为少量不同的音高轮廓。. . 就像我们将可以区分的[许多]颜色分组到一个小集合中一样我们可以将其识别为不同的颜色,并将其标记为“黄色”、“红色”、“绿色”等。
There is no limit to the number of different pitch contours that it is theoretically possible to produce. . . . But not all the variations in pitch that the speaker uses . . . are significant. The very large set of possible pitch contours can be thought of as being grouped into a small number of distinct pitch contours . . . rather in the same way that we group the [many] colours that we can tell apart into a small set which we can recognize as different colours, and which we label “yellow,” “red,” “green,” and so on.
对语调的知觉研究支持韩礼德的观点。一组基于情感中性语调轮廓相似性判断的研究表明,荷兰听众根据音高轮廓内发生的特定类型的上升和下降识别出六种基本语调模式,这些模式是他们语言中观察到的音高序列多样性的基础('t Hart等人,1990:82-88)。这项研究为那些对语音和音乐中动机相似性的跨领域研究感兴趣的人提供了一个很好的起点(另见 Gussenhoven & Rietveld,1991 年;Croonen,1994 年:第 7 章)。
Perceptual research on intonation supports Halliday’s idea. One set of studies based on similarity judgments of affectively neutral intonation contours suggests that Dutch listeners recognize six basic intonation patterns underlying the diversity of observed pitch sequences in their language, based on specific types of rises and falls that occur within pitch contours (’t Hart et al., 1990:82-88). This research provides an excellent starting point for those interested in cross-domain research on motivic similarity in speech and music (see also Gussenhoven & Rietveld, 1991; Croonen, 1994: Ch. 7).
音乐中的“调性关系”指的是音调之间的心理关系,这些心理关系是由音调之间相互关联的系统方式产生的。例如,西方音调旋律通常遵循基于每八度音程 12 个可能音高中的 7 个音高的音阶(参见第 2 章对音阶的讨论),对于习惯于这种传统的听众来说,与这种结构的背离在感知上非常显着(“酸味,”在本节后面讨论)。此外,旋律中的音调被组织成使得一些音高在结构上比其他音高更居中或更稳定。这可以用 K0016 来说明。考虑图 4.7a,其中 K0016 的每个音符都标有一个数字,代表每个音调在构建旋律的音阶中的位置或音阶(例如,音阶位置 1 是do, 2 是re, 3 是mi,等等,和-5在下面的八度音程中如此)。
“Tonality relations” in music refer to psychological relations between tones resulting from the systematic ways in which tones are employed in relation to each other. For example, Western tonal melodies typically adhere to a musical scale based on 7 out of 12 possible pitches per octave (cf. Chapter 2 for a discussion of musical scales), and departures from this structure are perceptually quite salient to listeners enculturated in this tradition (“sour notes,” discussed later in this section). Furthermore, tones within melodies are organized such that some pitches are more structurally central or stable than others. This can be illustrated with K0016. Consider Figure 4.7a, in which each note of K0016 has been marked with a number representing each tone’s position or scale degree in the scale from which the melody is built (e.g., scale position 1 is do, 2 is re, 3 is mi, etc., and -5 is so in the octave below).
图 4.7 (A) K0016 标示了各音的音阶。请注意,-5 对应于较低八度音阶中的音阶 5。(B) K0016 的协调。五线谱下方的字母给出了 C 大调的和弦,而五线谱上方的罗马数字表示和弦功能(I = 主音,IV = 次属,V = 属和弦)。对于不熟悉这些谐波术语的读者,它们在第 5 章中有定义。
Figure 4.7 (A) K0016 with scale degree of each tone marked. Note that -5 corresponds to scale degree 5 in the lower octave. (B) Harmonization of K0016. The letters below the staff give the chords in the key of C major, whereas the Roman numerals above the staff indicate chord functions (I = tonic, IV = subdominant, V = dominant). For readers unfamiliar with these harmonic terms, they are defined in Chapter 5.
请注意,音阶 1(“主音”)在此旋律中起着核心作用。它是乐句 1、2、4 和 5 的终音,因此作为旋律内和旋律结尾的休止点。此外,它总是出现在一个节拍上并且持续时间很长(参见第 3 章,图 3.2). 因此,1 级音阶充当整个旋律的感知“重心”,这个角色可以与 2 级音阶的作用形成对比。在该旋律中,2 级比 1 级更常见(出现 13 次对 4 次) ,但第 2 级从不作为休息点:相反,它几乎总是回到第 1 级。此外,它的持续时间从不长,而且经常不合时宜地出现。因此,缩放度数 1 和 2,虽然在频率上是相邻的,但它们的感知稳定性却有很大差异。音调 1 和 2 的生理和心理相似性之间的对比是调性音乐的特征。也就是说,音阶中稳定的音调通常伴随着不稳定的音调。如图 4.8所示, 它显示了 Krumhansl 和 Kessler (1982; cf. Steinke et al., 1997) 的一组经典实验的经验数据。
Note that scale degree 1 (the “tonic”) plays a central role in this melody. It is the final tone of phrases 1, 2, 4, and 5, and thus serves as a resting point within the melody and at the melody’s end. Furthermore, it always occurs on a beat and is of long duration (cf. Chapter 3, Figure 3.2). Scale degree 1 thus acts a perceptual “center of gravity” for the entire melody, a role that can be contrasted with the role of scale degree 2. Degree 2 is far more common than degree 1 in this melody (13 vs. 4 occurrences), yet degree 2 never serves as a resting point: Instead, it almost always leads back to degree 1. Furthermore, it is never long in duration, and frequently occurs off the beat. Thus scale degrees 1 and 2, though neighbors in frequency, vary dramatically in their perceptual stability. This contrast between physical and psychological similarity of tones 1 and 2 is characteristic of tonal music. That is, stable tones within a musical scale are often flanked by unstable tones. This is shown in Figure 4.8, which shows empirical data from a set of classic experiments by Krumhansl and Kessler (1982; cf. Steinke et al., 1997).
在这些实验中,听众听到了一段简短的音乐背景,例如给定键7内的短和弦序列,然后是短暂的停顿,然后是八度音阶内 12 个可能音高之一(“探测音”)。听众被要求以 1-7 的等级(其中 7 表示最适合)对音调与前面的音乐素材的契合程度或与前面的音乐素材的契合程度进行评分。给予每个音调的等级可以被认为是在给定的音乐环境中其音调稳定性的量度。可以看出,在大调或小调的上下文中,半音阶的 12 个音调之间存在较大的稳定性不对称。(图中面向C调,但代表不同调测试的数据:所有数据均已转置为C调。)
In these experiments, listeners heard a brief musical context such as a short chord sequence within a given key7 followed by a brief pause and then one of the 12 possible pitches within the octave (the “probe tone”). Listeners were asked to rate how well the tone fit into or went with the preceding musical material, on a scale of 1-7 (in which 7 indicated the best fit). The rating given to each tone can be thought of as a measure of its tonal stability in the given musical context. As can be seen, large asymmetries in stability exist among the 12 tones of the chromatic scale in a major-key or minor-key context. (The figure is oriented to the key of C but represents data from tests of different keys: All data have been transposed to the key of C.)
关注大调上下文的探测音调配置文件,请注意最稳定的音调(C 或音阶 1)两侧是稳定性较低的音阶音调(D 和 B,或音阶 2 和 7)。另请注意,高度稳定的音调 C、E 和 G(1、3 和 5 度)在频率上不是相邻的,但至少相隔 3 个半音。因此,图 4.8中所示的音高层次结构(或“音调层次结构”)使相邻音阶的结构作用明显不同,提供了一种心理拉力,直接违反格式塔听觉原则,格式塔听觉原则使频率邻居在心理上相似(例如,作为成员单个听觉流)。这两种关系之间的紧张关系可能是激发音乐旋律的力量之一。8个
Focusing on the probe-tone profile for the major-key context, note that the most stable tone (C, or scale degree 1) is flanked by scale tones of low stability (D and B, or scale degrees 2 and 7). Also note that the highly stable tones C, E, and G (degrees 1, 3, and 5) are not neighbors in frequency, but are separated by at least 3 semitones. Thus the pitch hierarchy (or “tonal hierarchy”) shown in Figure 4.8 makes adjacent scale degrees distinctly different in terms of their structural roles, providing a psychological pull that works directly against Gestalt auditory principles that make frequency neighbors psychologically similar (e.g., as members of a single auditory stream). The tension between these two types of relations may be one of the forces that animate musical melodies.8
图 4.8探测音调配置文件,表明听众对给定音调与先前音乐环境的匹配程度的判断。来自 Krumhansl & Kessler,1982。
Figure 4.8 Probe tone profiles, indicating listeners’ judgments of how well a given tone fits with a preceding musical context. From Krumhansl & Kessler, 1982.
音调关系对旋律感知的影响已经以多种方式得到证明。Bigand (1997) 让参与者聆听越来越长的旋律片段,并判断每个片段结尾的稳定性,换句话说,是旋律可以自然停止的感觉与它必须继续的感觉。稳定性判断在不同时期差异很大单个旋律的过程,展示了旋律感知的动态特性(参见 Boltz,1989)。造成这种差异的两个主要因素是最终音调在音高层次结构中的位置及其持续时间(参见声音示例 4.5a、b)。
The influence of tonal relations on melody perception has been demonstrated in a number of ways. Bigand (1997) had participants listen to increasingly long fragments of a melody and judge the stability of each fragment’s end, in other words, the feeling that the melody could naturally stop versus the feeling that it must continue. Stability judgments varied widely over the course of a single melody, demonstrating the dynamic nature of melody perception (cf. Boltz, 1989). Two main contributing factors to this variance were the final tone’s position in the pitch hierarchy and its duration (cf. Sound Examples 4.5a, b).
调性关系的另一个证明是众所周知的“酸音”现象,在声音示例 4.6 中得到证明。酸音是感知上突出的音符,因为它们违反了音调关系的规范。与名称所暗示的不同,酸音本质上没有任何问题:它是调音完美的音符,在另一种情况下听起来很正常(并且对于不熟悉音调音乐的人来说可能不会听起来很酸)。它的酸味与它在现行规模中缺乏成员有关。此外,它的酸味程度是它在音调等级中的位置的函数 (Janata et al., 2003)。没有受过明确音乐训练的听众可以检测到这样的音符,表明无需任何正式训练即可获得基本水平的音乐句法知识。的确,
Another demonstration of the power of tonality relations is the well-known phenomenon of the “sour note,” demonstrated in Sound Example 4.6. Sour notes are notes that are perceptually salient because they violate the norms of tonality relations. Unlike what the name implies, there is nothing inherently wrong with a sour note: It is perfectly well-tuned note that would sound normal in another context (and which presumably would not sound sour to someone unfamiliar with tonal music). Its sourness has to do with its lack of membership in the prevailing scale. Furthermore, its degree of sourness is a function of its position in the tonal hierarchy (Janata et al., 2003). Listeners with no explicit musical training can detect such notes, indicating that a basic level of musical syntactic knowledge can be acquired without any formal training. Indeed, the inability to detect such notes is indicative of musical tone deafness or “congenital amusia” (Kalmus & Fry, 1980; Drayna et al., 2001; Ayotte et al., 2002), a disorder discussed later in this chapter.
总之,音乐旋律中的音高被组织成具有显着的稳定性等级的音阶。这些层次结构反映在为不同结构目的而系统地使用不同的尺度度数(例如,尺度度数 1 作为稳定的静止点)。因此,不同的音阶在音乐结构中呈现出不同的心理感受。正如 Huron (2006:173) 所指出的,“在给定的语境中,音调听起来稳定、完整且悦耳。在另一种情况下,完全相同的语气会让人感到不稳定、不完整和恼人。” 单个音高可以通过这种方式栩栩如生,这是调性的一个显着事实。
In summary, pitches in musical melodies are organized into scales that feature salient hierarchies of stability. These hierarchies are reflected in the systematic use of different scale degrees for different structural purposes (e.g., scale degree 1 as a stable resting point). As a result, different scale degrees take on distinct psychological qualia in the fabric of the music. As noted by Huron (2006:173), “In a given context, a tone will sound stable, complete, and pleasant. In another context, that exact same tone will feel unstable, incomplete, and irritating.” It is a remarkable fact of tonality that individual pitches can come to life in this way.
目前,没有证据表明语音旋律中有任何类似音阶或音高等级的东西。如第 4.1.1 节所述,某些语调比其他语调更稳定或居中是没有意义的。这是音乐旋律和口语旋律之间非常显着的区别,但不应掩盖这两个领域中旋律之间的许多其他联系点。
At present, there is no evidence of anything resembling scales or pitch hierarchies in speech melody. As noted in section 4.1.1, there is no sense in which some intonational tones are more stable or central than others. This is a very salient difference between musical and spoken melody, but it should not overshadow the many other points of contact between melody in the two domains.
上一节中描述的音高之间的层次稳定性关系是从音乐旋律模式的经验中得出的非时间模式。音乐旋律也有不同类型的音高层次结构,涉及各个序列中音高之间的时间关系,换句话说,“事件层次结构”(Bharucha,1984a)。事件层次结构的基本思想是,旋律中的某些音高充当其结构骨架,而其他音高则用于详细说明或修饰此骨架。这个概念是西欧音乐理论的核心(例如,Schenker (1969)、Meyer (1973) 以及 Lerdahl 和 Jackendoff (1983; cf. Cook, 1987a) 的理论),并且也是在许多非西方文化的音乐理论中都有发现,包括中国和印度。例如,在一种中国民间传统中,音乐家将装饰音称为基本旋律结构的“添花”(Jones,1995)。区分结构音高和装饰音高的能力被认为在旋律感知中起着重要作用,例如在听众将一种旋律识别为另一种旋律的精心制作版本的能力中(参见 Lerdahl & Jackendoff,1983)。
The hierarchical stability relations between pitches described in the previous section were atemporal schema derived from experience with musical melodic patterns. Musical melodies also have pitch hierarchies of a different sort concerning temporal relations between pitches in individual sequences, in other words, “event hierarchies” (Bharucha, 1984a). The basic idea of an event hierarchy is that some pitches in a melody act as its structural skeleton, whereas others serve to elaborate or ornament this skeleton. This notion is central to Western European music theory (e.g., the theories of Schenker (1969), Meyer (1973), and Lerdahl and Jackendoff (1983; cf. Cook, 1987a), and is also found in the music theory of many non-Western cultures, including China and India. For example, in one Chinese folk tradition, musicians speak of ornamentation as “adding flowers” to a basic melodic structure (Jones, 1995). The ability to distinguish structural from ornamental pitches is thought to play an important role in melody perception, such as in a listener’s ability to recognize one melody as an elaborated version of another (cf. Lerdahl & Jackendoff, 1983).
实证研究表明,音高层次与节奏因素相结合,在塑造音调音乐中的事件层次方面发挥着重要作用。音乐事件等级的证据在第 5章5.2.2 节中讨论。这里的相关问题是语音语调是否表现出事件层次结构。令人惊讶的是,至少一种现代的语调方法(AM 理论)表明答案可能是“是”,尽管语调的事件层次与基于不同程度的感知稳定性的音高层次无关。这个问题在第 4.3.1 节中讨论。
Empirical research has shown that pitch hierarchies, in combination with rhythmic factors, play an important role in shaping event hierarchies in tonal music. Evidence for musical event hierarchies is discussed in Chapter 5, section 5.2.2. Here the pertinent question is whether speech intonation exhibits event hierarchies. Surprisingly, at least one modern approach to intonation (AM theory) suggests that the answer may be “yes,” though the event hierarchies of intonation have nothing to do with pitch hierarchies based on differing degrees of perceived stability. This issue is taken up in section 4.3.1.
西方调性音乐中的旋律通常隐含和声,即背景和弦进行,从中得出旋律的重要音调(对于那些不熟悉音乐和弦概念的人,和弦和和弦句法将在第 5 章中讨论)。因此,在旋律的音调下面有一个层次结构的音高组织,有它自己的组合和模式原则。为了说明这一点,图 4.7b显示了 K0016 的基础和弦方面的和声分析,声音示例 4.7 为 K0016 提供了和弦伴奏,以明确此和声。旋律背后的和弦序列结构在音乐家和非音乐家的旋律感知中都发挥着重要作用,影响对旋律音乐连贯性的判断以及对旋律的记忆(Cuddy 等人,1981 年;Povel 和 Jansen,2001 年)。歌曲学习研究也有证据表明,听众会抽象出旋律的潜在和声结构。Sloboda 和 Parker (1985) 让听众在钢琴上唱出不熟悉的旋律。对他们的回忆错误(例如,音程错误)的分析表明,他们保留了旋律的整体轮廓和背景和声进行(参见 Davidson 等人,1981)。
Melodies in Western tonal music typically have implied harmony, a background chord progression from which important tones of the melody are drawn (for those unfamiliar with the concept of a musical chord, chords and chord syntax are discussed in Chapter 5). Thus there is a hierarchical level of pitch organization beneath the tones of the melody, with its own principles of combination and patterning. To illustrate this point, Figure 4.7b shows a harmonic analysis in terms of underlying chords for K0016, and Sound Example 4.7 presents K0016 with a chordal accompaniment to make this harmony explicit. The structure of the chord sequence underlying a melody plays a role in melody perception for both musicians and nonmusicians, influencing judgments of the musical coherence of melodies as well as memory for melodies (Cuddy et al., 1981; Povel & Jansen, 2001). There is also evidence from studies of song learning that listeners abstract the underlying harmonic structure of a melody. Sloboda and Parker (1985) had listeners sing back unfamiliar melodies presented to them on a piano. An analysis of their errors of recall (e.g., interval errors) showed that they preserved the overall contour of the melody and the background harmonic progression (cf. Davidson et al., 1981).
尽管语音语调中没有任何类似和弦结构或和声的东西,但语音合成中有有趣的证据表明,局部音调事件可能会组合成更大的结构,这些结构具有自己的模式原则。在一种语调合成方法(IPO 模型,在第 4.3.2 节中描述)中,语调轮廓是由按顺序排列的标准化音高运动构建的('t Hart & Collier,1975;'t Hart 等人,1990)。研究人员发现,可接受的语调模式的合成涉及将小的音高移动非随机地安排到局部“配置”中(连接某些类型的上升和下降),依次形成跨越单个子句的轮廓。并非所有可能的配置序列都形成可接受的轮廓,换句话说,也存在对轮廓形成的限制。这种多层次组织让人想起将连续的音高组织成和弦,以及将和弦组织成音序,两者都可能反映了将音高序列组织成多层次模式的普遍倾向(参见 Ladd,1986)。
Although there is nothing resembling chord structure or harmony in speech intonation, there is intriguing evidence from speech synthesis that local pitch events may combine into larger structures that have their own principles of patterning. In one approach to intonation synthesis (the IPO model, described in section 4.3.2), intonation contours are constructed from standardized pitch movements arranged in sequences (’t Hart & Collier, 1975; ’t Hart et al., 1990). Researchers have found that synthesis of acceptable intonation patterns involves arranging small pitch movements nonrandomly into local “configurations” (linking certain kinds of rises and falls), which in turn are sequenced to form contours spanning a single clause. Not all possible sequences of configurations form acceptable contours, in other words, constraints on contour formation also exist. This multilevel organization is reminiscent of the organization of successive pitches into chords and chords into sequences, and both may reflect a general propensity to organize pitch sequences into patterns at multiple hierarchical levels (cf. Ladd, 1986).
前面八节讨论了音乐旋律所产生的各种感知关系。这些关系之间也存在关系,换句话说,元关系。例如,编组和节拍的轻微错位可以以 anacrusis 或 upbeat 的形式为旋律添加节奏能量,如 K0016 的乐句 2 的开头。另一个元关系涉及音调层次和节奏之间的关系。Boltz (1991) 发现,与没有此特征的旋律相比,以固定的时间间隔(例如,在短语的末尾)出现音调稳定的旋律更容易被记住。
The preceding eight sections have discussed different kinds of perceptual relations engendered by musical melody. There are also relations between these relations, in other words, meta-relations. For example, a slight misalignment of grouping and beat can add rhythmic energy to a melody in the form of anacrusis or upbeat, as in the onset of phrase 2 of K0016. Another meta-relation concerns the relation between the tonal hierarchy and rhythm. Boltz (1991) has found that melodies in which tonally stable pitches occur at regular temporal intervals (e.g., at the ends of phrases) are remembered more accurately than melodies without this feature.
音乐旋律中是否存在任何可以与语音语调相比较的元关系?在这方面的一项相关研究涉及由于音高和节奏引起的重音的感知相互作用,换句话说,旋律序列的“联合重音结构”(Jones,1987 年,1993 年)。联合重音结构是指旋律音高模式中的显着点(例如轮廓峰(Huron&Royal,1996))与旋律时间模式中的显着点(例如延长的音调)之间的时间关系。有证据表明,听众对音乐中这些点的相对时间很敏感:它们的排列和周期性关系会影响旋律的记忆(Jones & Ralston,1991),以及人们在旋律中如何将他们的拍子与重音同步(Jones & Pfordresher, 1997)。为了比较音乐和语音中的旋律,音高峰值和长事件之间的时间对齐问题很有趣,因为可以在音乐旋律和语调轮廓中识别轮廓峰值和延长事件。为了说明这个想法,考虑图 4.9显示了法语句子中每个元音的音高和持续时间:“Les mères sortent de plus en plus rapidement de la maternité”,由一位女性说话者说出(参见声音示例 4.8)。
Are there any meta-relations in musical melodies that one can compare to speech intonation? One relevant line of research in this regard concerns the perceptual interactions of accents due to pitch and rhythm, in other words, the “joint accent structure” of melodic sequences (Jones, 1987, 1993). Joint accent structure refers to the temporal relation between salient points in the pitch pattern of a melody, such as contour peaks (Huron & Royal, 1996), and salient points in the temporal pattern of a melody, such as lengthened tones. Evidence suggests that listeners are sensitive to the relative timing of these points in music: Their alignment and periodicity relations can influence memory for melodies (Jones & Ralston, 1991) and how well people can synchronize their taps with accents in melodies (Jones & Pfordresher, 1997). For the purposes of comparing melody in music and speech, the issue of temporal alignment between pitch peaks and long events is of interest, because contour peaks and lengthened events can be identified in both musical melodies and in intonation contours. To illustrate this idea, consider Figure 4.9, which shows the pitch and duration of each vowel for a sentence of French: “Les mères sortent de plus en plus rapidement de la maternité” as uttered by a female speaker (cf. Sound Example 4.8).
图中的音高值是根据句子的音程图分析计算得出的(参见图 4.4和第 4.3.2 节),并以句子中最低音高的半音表示。请注意音调和持续时间时间序列中的显着峰值。确定法语句子中音高和持续时间峰值的对齐是否显示任何统计规律,以及这些规律是否与其他语言(例如英国英语)中发现的模式不同,这将是一件很有趣的事情。如果是这样,那么人们就可以检查两种文化音乐的旋律,并询问语言排列差异是否反映在音乐中旋律轮廓峰和长音的排列上。
The pitch values in the figure were computed from a prosogram analysis of the sentence (cf. Figure 4.4, and section 4.3.2), and are shown in terms of semitones from the lowest pitch in the sentence. Note the salient peaks in both the pitch and duration time series. It would be interesting to determine if the alignment of pitch and duration peaks in French sentences shows any statistical regularities, and if these regularities differ from the patterns found in other languages (e.g., British English). If so, one could then examine melodies from the music of two cultures, and ask if linguistic alignment differences are reflected in the alignment of melodic contour peaks and long tones in music.
图 4.9法语句子中每个元音的音调和持续时间:“Les mères sortent de plus en plus rapidement de la maternité”,由一位女性说话者说出。顶部:根据音程图计算的音高值,显示为距音高最低的元音的半音。底部:每个元音的持续时间(以毫秒为单位)。
Figure 4.9 The pitch and duration of each vowel in a sentence of French: “Les mères sortent de plus en plus rapidement de la maternité” as spoken by a female speaker. Top: Pitch values as computed from a prosogram, shown as semitones from the vowel with the lowest pitch. Bottom: Duration of each vowel in ms.
4.2 节回顾了旋律感知的一些关键方面,发现在许多情况下可以与语音语调感知的某些方面进行比较。这就提出了一个明显的问题。如果音乐和语言旋律感知之间存在有趣的相似之处,那么为什么这两个领域的旋律在主观体验上“感觉”如此不同?例如,音乐旋律可能会“萦绕在脑海中”好几天,而口语旋律很少会以声音模式的形式引起我们的兴趣。一个原因可能是音乐旋律产生了一组更丰富的感知关系。例如,第 4.2 节列出了由简单的音乐旋律调用的八个基本关系,这绝不是一个详尽的列表(参见 Meyer,1973;Narmour,1990;Patel,2003a)。此外,这些关系之间有许多可能的元关系。(事实上 ,仅给定八个关系,可能的成对元关系的数量为 28!)这可能就是为什么音乐旋律感知在心理上与语调感知如此不同。不仅音乐旋律具有口语旋律所没有的感性关系(如节拍、音程结构、建立在音程结构上的调性关系);事实上,这些额外的关系会导致更多的元关系。因此,音乐旋律中可能的感知关系比语音旋律中的感知关系更多、更复杂。
Section 4.2 has reviewed a number of key aspects of melody perception, and found that in many cases one can draw a parallel with some aspect of speech intonation perception. This raises an obvious question. If there are interesting parallels between musical and linguistic melody perception, then why do melodies in the two domains “feel” so different in terms of subjective experience? For example, musical melodies can get “caught in one’s head” for days, whereas spoken melodies rarely capture our interest as sound patterns. One reason for this may be that musical melodies engender a much richer set of perceptual relations. For example, section 4.2 lists eight basic relations invoked by a simple musical melody, and this is by no means an exhaustive list (cf. Meyer, 1973; Narmour, 1990; Patel, 2003a). Furthermore, there are many possible meta-relations between these relations. (In fact, given just eight relations, the number of possible pairwise meta-relations is 28!) This may be why musical melody perception is psychologically so distinct from intonation perception. It is not only that musical melodies have perceptual relations that spoken melodies do not (such as beat, interval structure, and the tonality relations built on interval structure); it is the fact that these additional relations lead to many more meta-relations. Thus the possible perceptual relations in musical melodies are far more numerous and intricate than those in speech melodies.
音乐旋律的复杂性不应阻止在这两个领域中寻找旋律之间的联系。相反,深入了解音乐旋律感知的组成部分可以帮助指导寻找与语言有意义的联系。根据第 4.2 节中的回顾,一个特别有前途的环节是旋律轮廓感知。出于这个原因,该主题在第 4.5.2 节中进行了更详细的讨论,该节利用认知神经科学的数据进一步探索音乐与语言之间的这种联系。
The greater intricacy of musical melodies should not deter the search for links between melody in the two domains. On the contrary, a solid understanding of the components of musical melody perception can help guide the search for meaningful connections with language. Based on the review in section 4.2, one link that stands out as particularly promising is melodic contour perception. For this reason, the topic is taken up in greater detail in section 4.5.2, which draws on data from cognitive neuroscience to further explore this connection between music and language.
虽然第 4.2 节着重于音乐旋律及其与语音的联系,但在本节中,视角是相反的。也就是说,语音语调是优先考虑的,重点是现代语调研究表明与音乐的联系。这一观点的一个挑战是在当今可用的许多可用语调理论中进行选择(Hirst & Di Cristo,1998)。在这里,我重点关注两种理论,之所以选择它们是因为它们与比较语音音乐研究相关。
Although section 4.2 focused on musical melody and its connections to speech, in this section the perspective is reversed. That is, speech intonation is given priority and the focus is on what modern research on intonation suggests about connections with music. One challenge for this perspective is selecting among the many available theories of intonation available today (Hirst & Di Cristo, 1998). Here I focus on two theories, chosen because of their relevance to comparative speech-music research.
在深入研究这些理论之前,有必要回到本章前面提到的关于语音旋律和音乐旋律之间重要区别的观点(参见第 4.1.1 节)。与音乐旋律不同,语音语调不是围绕一组稳定的音高间隔建立的。为什么会这样?鉴于语言多样性的巨大范围,为什么没有一种已知的语言使用稳定的音高音程进行语调?这样的系统在理论上是可行的,因为几乎每一种文化的声乐中都可以找到基于音程的音高对比。
Before delving into these theories, it worth returning to a point made earlier in this chapter about an important difference between melody in speech and music (cf. section 4.1.1). Unlike musical melody, speech intonation is not built around a stable set of pitch intervals. Why is this the case? Given the enormous range of linguistic diversity, why is there not a single known language that uses stable pitch intervals for intonation? Such a system is theoretically possible, because interval-based pitch contrasts are found in the vocal music of virtually every culture.
可能的原因是口语在单个声学通道中混合了情感和语言语调。情感语调是梯度信号系统的一个例子:情绪状态和发出它们信号的音调提示都在不同方面有所不同一种连续的时尚。9例如,高平均音高和宽广、平滑的音高轮廓向听众传达了一种快乐而不是悲伤的情感状态(Juslin 和 Laukka,2003 年)。此外,还有其他非语言因素会以连续的方式影响音调高度和音调范围,例如说话的响度 (Shriberg et al., 2002)。这意味着语调不能利用音高之间具有固定间距的频率网格。相反,语言语调的音高对比必须以灵活和相对的方式实现(Ladd,1996;参见第 2 章,第 2.3.2 节), “仔细观察声调语言中水平声调之间的音高对比”小节)。缺乏稳定的音程结构是音乐旋律和口语旋律之间最显着的区别,这可能是音乐旋律理论和语音旋律理论在概念上联系如此之少的原因。10尽管如此,正如我们将要看到的那样,还是有可能识别出语言和音乐旋律之间有趣的联系。
The likely reason is that spoken language mixes affective and linguistic intonation in a single acoustic channel. Affective intonation is an example of a gradient signaling system: Emotional states and the pitch cues signaling them both vary in a continuous fashion.9 For example, high average pitch and wide-ranging, smooth pitch contours convey a happy rather than a sad affective state to a listener (Juslin & Laukka, 2003). Furthermore, there are other nonlinguistic factors that influence pitch height and pitch range in a continuous way, such as the loudness with which one speaks (Shriberg et al., 2002). This means that intonation cannot utilize a frequency grid with fixed spacing between pitches. Instead, the pitch contrasts of linguistic intonation must be realized in a flexible and relative way (Ladd, 1996; cf. Chapter 2, section 2.3.2, subsection “A Closer Look at Pitch Contrasts Between Level Tones in Tone Languages”). The lack of a stable interval structure is the single most salient difference between musical and spoken melodies, and is likely why theories of musical melody and theories of speech melody have had so little conceptual contact.10 Despite this fact, it is possible to identify interesting connections between linguistic and musical melody, as we shall see.
另一点值得一提的是,语调在语言上的使用比人们普遍认为的更为多样化。例如,对于说英语的人来说,使用音高来关注句子中的某些单词似乎是语调的普遍属性,但比较的观点表明情况并非如此。索马里语使用语言语素来标记对单词的关注,而不是语调(Antinucci,1980 年;Lecarme,1991 年)。语调多样性的另一个例子涉及“是-否”问题末尾的升调与陈述末尾的降调之间的关系。虽然这种模式在西欧语言中很常见,但绝不是普遍的。例如,标准匈牙利语中的问题在句子的倒数第二个音节上具有高音调,然后是最后一个音节的尖锐下降,因此“可以毫不夸张地说,许多匈牙利语问题对以英语为母语的人来说听起来像是强调性陈述……”。. ” (拉德,1996:115)。此外,在贝尔法斯特英语中,陈述句的末尾音调上升与问题一样频繁(Grabe,2002)。这些只是两个例子,但它们表明必须谨慎地对语言语调进行概括(Ladd,2001)。
One other point worth mentioning is that that the linguistic use of intonation is more diverse than has generally been appreciated. For example, to a speaker of English it might seem that using pitch to put focus on certain words in sentences is a universal property of intonation, but a comparative perspective reveals that this is not the case. Somali uses linguistic morphemes to mark focus on words, rather than intonation (Antinucci, 1980; Lecarme, 1991). Another example of intonational diversity concerns the relationship between rising pitch at the end of yes–no questions versus falling pitch at the end of statements. Although this pattern is common in Western European languages, it is by no means universal. For example, questions in Standard Hungarian have high pitch on the penultimate syllable of a sentence, followed by a sharp pitch fall on the final syllable, so that “it is not overstating the case to say that many Hungarian questions sound like emphatic statements to native speakers of English . . .” (Ladd, 1996:115). Furthermore, in Belfast English, statements are produced with a final pitch rise as often as questions (Grabe, 2002). These are but two examples, but they serve to indicate that generalizations about linguistic intonation must be made with care (Ladd, 2001).
当今最有影响力的语调研究框架之一是“自动音段韵律”(AM) 理论,它起源于布鲁斯 (1977) 和皮埃尔亨伯特 (1980)。这种方法往往由主要关注语言结构而不是听觉感知的研究人员采用。由于对 AM 理论的详细讨论超出了本书的范围(参见 Ladd,1996 年及即将出版),因此我在此仅针对语言-音乐比较研究作一简要介绍。我还关注英语,其中大多数 AM 框架的研究已经发生,尽管 AM 风格的描述存在于越来越多样化的语言范围内(2005 年 6 月)。
One of the most influential frameworks for intonation research today is “autosegmental-metrical” (AM) theory, which has its origins in the work of Bruce (1977) and Pierrehumbert (1980). This approach tends to be practiced by researchers whose primary focus is linguistic structure rather than auditory perception. Because a detailed discussion of AM theory is beyond the scope of this book (see Ladd, 1996, and forthcoming), here I give only a brief introduction oriented toward comparative language-music research. I also focus on English, in which most research in the AM framework has taken place, though AM-style descriptions exist for an increasingly diverse range of languages (Jun, 2005).
AM 理论的一个基本原则是语调的语言方面基于直言涉及离散音调事件序列的区别。具体来说,语音的连续 F0 轮廓被视为反映了与物理 F0 轮廓中的某些点相关联的一系列不同的“音调重音”和“边缘音调”。语调轮廓的其余部分被认为只是这些事件之间的插值,因此大部分 F0 轨迹在语言上并不重要。也就是说,语音中的 F0 轮廓被视为在音高和时间上某些明确定义的目标之间移动的结果,这种观点非常类似于音乐中旋律的概念(尤其是声乐,其中声调通常以蜿蜒曲折的方式移动)方式,例如,Seashore,1938 年;Sundberg,1987 年;参见 Bretos & Sundberg,2003 年)。
A basic tenet of AM theory is that the linguistic aspects of intonation are based on categorical distinctions involving sequences of discrete pitch events. Specifically, the continuous F0 contours of speech are viewed as reflecting a sequence of distinct “pitch accents” and “edge tones” associated with certain points in the physical F0 contour. The rest of the intonation contour is considered a mere interpolation between these events, so that most of the F0 trajectory is not linguistically significant. That is, F0 contours in speech are seen as the result of moving between certain well-defined targets in pitch and time, a view quite analogous to the conception of melody in music (especially vocal music, in which voice pitch often moves in a sinuous manner, e.g., Seashore, 1938; Sundberg, 1987; cf. Bretos & Sundberg, 2003).
AM 模型中的音调重音是根据基本的音调清单构建的。例如,在一种为 ToBI 分析系统提供信息的当前理论中(Beckman 等人,2005 年),有两种音调,与两种对比性音高水平相关联:高音和低音(H 和 L)。不同类型的音调重音可以从这些音调构建,例如,H*、L*、L*+H 和 L+H*(请参见表 4.1). 星号表示给定的音调与重读音节相关联)。其中前两个涉及单音,而后两个是“双调”并且涉及从低到高的上升音高运动。有人认为,在英语中,这些不同的音高重音表示特定单词的语用状态(例如,Pierrehumbert & Hirschberg,1990;参见 Wennerstrom,2001)。这建立在语言学家长期以来持有的观点之上,即语调单位具有话语意义(例如,Pike,1945;Bolinger,1958;cf. Ladd,1987)。
Pitch accents in the AM model are built from a basic inventory of tones. For example, in one current theory that informs the ToBI system of analysis (Beckman et al., 2005), there are two tones, associated with two contrastive pitch levels: high and low (H and L). Different types of pitch accents can built from these tones, for example, H*, L*, L*+H, and L+H* (see Table 4.1). The star means that the given tone is associated with a stressed syllable). The first two of these involve single tones, whereas the latter two are “bitonal” and involve a rising pitch movement from low to high. It has been argued that in English these different pitch accents signal the pragmatic status particular words (e.g., Pierrehumbert & Hirschberg, 1990; cf. Wennerstrom, 2001). This builds on a long-held idea among linguists that the units of intonation have discourse meaning (e.g., Pike, 1945; Bolinger, 1958; cf. Ladd, 1987).
除了音高重音之外,AM 方法还假定语调轮廓具有标记短语边缘的音调。英语句子中的单词据称可以分为两个级别的韵律短语:较小的中间短语,或“ip's”,其边缘标有高音或低音短语音调(H-,L-)和较大的语调短语,或“IP's” ”,其边缘标有高或低边界色调(H%、L%)。与音调重音一样,短语音调和边界音调具有实用意义:它们用于指示 ip 或 ip 是否与前面或后面的 ip/IP 相关。与音高重音相反,边缘音不传达感知上的突出。
In addition to pitch accents, the AM approach posits that intonation contours have tones that mark the edges of phrases. Words in English sentences are claimed to be grouped into prosodic phrases at two levels: smaller intermediate phrases, or “ip’s” whose edges are marked by a high or low phrase tone (H-, L-) and larger intonational phrases, or “IP’s” whose edges are marked by a high or low boundary tone (H%, L%). As with pitch accents, a pragmatic meaning has been suggested for phrase tones and boundary tones: They serve to indicate whether an ip or an IP is related to the preceding or following ip/IP. In contrast to pitch accents, edge tones do not convey perceptual prominence.
T$able 4.1 The Discourse Meanings of Different Pitch Accents in AM Theory
AM 理论与音乐比较的一个重要方面涉及基本元素(音高重音和边音)之间的顺序关系。这些关系受到多大程度的限制?Pierrehumbert 和 Hirschberg (1990) 认为语调轮廓的意义是“组合的”,换句话说,每个元素都贡献了特定的语用意义,而整个轮廓(即语调“音调”或元素序列)传达的不是超出部分简单总和的附加含义。这表明曲调在元素的顺序方面受到最小的限制,并且在大多数情况下,音调的选择独立于紧接在它们之前的音调。这与音乐旋律形成鲜明对比,在音乐旋律中,音调在局部和全球范围内都有图案。然而,有证据表明语调序列确实具有顺序限制(参见 Ladd,1996:Ch. 6)。例如,Grabe 等人。(1997) 表明,初始 H 或 L 边界音的含义取决于它后面是 H 还是 L 音高重音。最近,Dainora (2002) 检查了一个用 AM 符号标记的大型语料库,并搜索了音高重音和边音序列的统计约束。她发现了强有力的证据表明音调的选择并不是独立于它们之前的音调。例如,边音(如 L 或 H)的特性受其前面特定类型的音调重音的影响。因此,语调曲调确实具有一些统计规律性,尽管与音乐旋律相比,这些约束似乎相当弱。
An important aspect of AM theory with regard to comparison with music concerns the sequential relations between the basic elements (pitch accents and edge tones). How constrained are these relations? Pierrehumbert and Hirschberg (1990) argued that the meaning of intonation contours is “compositional,” in other words, each element contributes a particular pragmatic meaning and the contour as a whole (i.e., the intonational “tune” or sequence of elements) conveys no additional meaning above and beyond the simple sum of the parts. This suggests that tunes are subject to minimal constraints in terms of the sequencing of elements, and that for the most part tones are chosen independently of the tones that immediately precede them. This stands in sharp contrast to musical melody, in which tones are patterned at both local and global scales. There is evidence, however, that intonational sequences do have sequential constraints (cf. Ladd, 1996: Ch. 6). For example, Grabe et al. (1997) showed that the meaning of an initial H or L boundary tone depended on whether it was followed by a H or L pitch accent. More recently, Dainora (2002) examined a large corpus of speech that had been labeled with AM notation and searched for statistical constraints on sequences of pitch accents and edge tones. She found strong evidence that tones are not chosen independently of the tones that precede them. For example, the identity of an edge tone (as L or H) is influenced by the particular types of pitch accents that precede it. Thus intonational tunes do have some statistical regularities, though the constraints appear to be quite weak compared to musical melodies. Dainora found that these regularities could be captured by a second-order Markov model (a statistical model in which an element’s identity depends upon only the identity of the two preceding elements).
关于 AM 系统中的两个不同的音调级别,需要说明一下。因为这些是抽象级别,所以它们不直接对应于特定音高值或 H 和 L 之间的特定音高间隔,而是简单地指示音调相对于相邻音调的相对音高。事实上,这些抽象级别和实际频率值之间的映射(称为“缩放”)是一个复杂的问题,也是正在进行的研究重点。此类研究表明,缩放涉及多个因素,包括单词的突出程度、偏角(F0 值在话语过程中下降的一般趋势;参见第 4.1.1 节), 以及音调出现的本地环境。例如,当高音出现在某些音调上下文中时,它可能会被“降低”(频率降低),或者由于音调拥挤导致的下冲,音调的音高目标可能无法实现。由于这些复杂性,H 和 L 音不对应于固定的频率值,而只是音高范围内的相对位置,而音高范围本身会发生变化 (Pierrehumbert, 2000)。因此,根据 H 和 L 调分析语调轮廓不是可以在音高轮廓的基础上自动完成的事情,而是需要 AM 理论的训练,并且由在检查音调的视觉显示的同时听句子的个人完成。 F0 轮廓。
A word needs to be said about the two distinct tonal levels in the AM system. Because these are abstract levels they do not correspond directly to specific pitch values or a specific pitch interval between H and L, but simply indicate the relative pitch height of a tone with regard to neighboring tones. In fact, the mapping between these abstract levels and actual frequency values (referred to as “scaling”) is a complex issue that is the focus of ongoing research. Such research suggests that multiple factors are involved in scaling, including the degree of prominence of a word, declination (the general tendency for F0 values to decrease over the course of an utterance; cf. Section 4.1.1), and the local context in which tones occur. For example, a high tone may be “downstepped” (lowered in frequency) when it occurs in certain tonal contexts, or the pitch target of a tone may not be realized because of undershoot due to tonal crowding. As a consequence of these complexities, H and L tones do not correspond to fixed frequency values, but are simply relative positions within a pitch range that is itself subject to change (Pierrehumbert, 2000). Consequently, analysis of intonation contours in terms of H and L tones is not something that can be done automatically on the basis of pitch contours, but requires training in AM theory and is done by individuals who listen to sentences while examining a visual display of the F0 contours.
AM 模型推动的实证研究的一个有趣领域是音调相对于语音边界的精确计时(称为“音调对齐”)。对对齐的研究产生了这样一种想法,即 AM 的抽象音调被实现为语音“音调目标”:语调轮廓中的特定点(通常是 F0 轨迹中的峰或谷)与某些语音界标具有稳定的关系,例如作为重读音节的边缘 (Arvaniti et al., 1998; Ladd et al., 1999)。例如,有人提出希腊语中某些类型的音调目标会在之后的某个短距离内及时出现重读音节的结尾。音调对齐的感知意义的一个有趣证据与模仿者的言语有关。Zetterholm (2002) 研究了一位擅长模仿瑞典著名人物声音的专业瑞典模仿者。Zetterholm 专注于在模仿者的自然声音和他模仿时的声音的录音中多次出现的一个特定单词的产生,表明模仿者系统地改变了这个单词的音调峰值与模仿者母语的值特征的对齐方式方言到被模仿者方言的价值特征。转向对齐方面的语言差异,Atterer 和 Ladd(2004 年)发现德语中的音高上升比英语中类似的音高上升显示出更晚的对齐,
One interesting area of empirical research motivated by the AM model is the precise timing of tones with respect to phonetic boundaries (referred to as “tonal alignment”). Research on alignment has given rise to the idea that the abstract tones of AM are implemented as phonetic “tonal targets”: particular points in the intonation contour (often peaks or valleys in the F0 trajectory) that have stable relationships to certain phonetic landmarks, such as the edges of stressed syllables (Arvaniti et al., 1998; Ladd et al., 1999). For example, it has been proposed that certain types of tonal targets in Greek occur at a certain short distance in time after the end of a stressed syllable. One interesting line of evidence for the perceptual significance of tonal alignment concerns the speech of impersonators. Zetterholm (2002) studied a professional Swedish impersonator who was good at imitating the voice of a famous Swedish personality. Focusing on the production of one particular word that occurred several times in recordings of the impersonator’s natural voice and of his voice while impersonating, Zetterholm showed that the impersonator systematically changed the alignment of the tonal peak of this word from a value characteristic of the impersonator’s native dialect to a value characteristic of the dialect of the imitated person. Turning to linguistic differences in alignment, Atterer and Ladd (2004) found that pitch rises in German showed later alignment than comparable pitch rises in English, and that native speakers of German carried over their alignment patterns when speaking English as a second language.
有些读者可能会发现 AM 方法在 F0 的连续声学变化下放置分类结构过于抽象。与元音感知的比较在这里很有帮助。没有人质疑每一种语言都有一个语言上不同的元音清单,每个元音都为该语言的听众形成一个独特的感知类别。然而在语音的声学中,给定的元音并不对应于固定的声学模式。不仅由于声道差异和由此产生的共振峰值差异,元音的声学效果在说话者之间会有所不同(Peterson & Barney,1952),在单个说话者的语音中,元音的声学实现也会因联合发音和其他因素而变化,例如语速过快导致的 undershoot (Lindblom, 1990)。尽管如此,听众能够从这种多样性中抽象出来,并听到它们下面的不同语言类别;事实上,这种能力甚至存在于婴儿身上(Kuhl,1983)。元音感知表明,大脑非常善于将连续声学空间(在本例中为共振峰值)中的分级变化解析为离散的语言类别。
Some readers may find the AM approach overly abstract in positing a categorical structure beneath the continuous acoustic variation of F0. A comparison to vowel perception is helpful here. No one disputes that every language has an inventory of linguistically distinct vowels, each of which forms a distinct perceptual category for listeners of that language. Yet in the acoustics of speech, a given vowel does not correspond to a fixed acoustic pattern. Not only does a vowel’s acoustics vary across speakers due to vocal tract differences and the resulting differences in formant values (Peterson & Barney, 1952), within the speech of a single speaker a vowel’s acoustic realization varies due to coarticulation and other factors, such as undershoot due to rapid speech (Lindblom, 1990). Nevertheless, listeners are able to abstract away from this diversity and hear the distinct linguistic categories beneath them; indeed, this ability is present even in infants (Kuhl, 1983). Vowel perception shows that the mind is quite adept at parsing graded variation in a continuous acoustic space (in this case, of formant values) into discrete linguistic categories.
AM 方法处理语调的方式与音韵学处理元音的方式非常相似,将离散的心理类别置于连续且可变的语音表面之下。当然,用元音比用语调更容易证明知觉的离散性。改变元音会以分类方式改变单词的语义(例如,bet vs. beet),这样听众就可以很容易地同意不同的元音对应于不同的语言类别。更难证明音高对比向听众发出不同的语用类别信号。简单地询问听众给定的音高重音“意味着什么”并不是最佳策略,因为听众不习惯对语用(而不是语义)含义做出明确的判断。因此,需要对语调感知的离散性进行间接测试。由 Pierrehumbert 和 Steele (1989) 开创的一种有趣的方法是使用模仿范式,要求参与者尽可能地模仿具有特定 F0 模式的短话语的语调。使用电脑编辑声音,模型话语的 F0 模式在代表两种不同假设音高重音类型的两个极端之间以小增量变化。对参与者产生的 F0 模式的检查表明双峰分布,就好像听众正在将 F0 变化的连续统一体解析为两个不同的类别(参见 Redi,2003 年;Dilley,2005 年)。
The AM approach treats intonation in much in the same way that phonology treats vowels, positing discrete mental categories underneath a continuous and variable phonetic surface. Of course, it is easier to demonstrate perceptual discreteness with vowels than with intonation. Changing a vowel alters the semantic meaning of a word in a categorical fashion (e.g., bet vs. beet), so that listeners can readily agree that different vowels correspond to distinct linguistic categories. It is more difficult to demonstrate that pitch contrasts signal distinct pragmatic categories to listeners. Simply asking listeners what a given pitch accent “means” is not an optimal strategy, as listeners are not used to making explicit judgments about pragmatic (as opposed to semantic) meanings. Thus indirect tests of discreteness in intonation perception are needed. One interesting approach, pioneered by Pierrehumbert and Steele (1989), is to use an imitation paradigm in which a participant is asked to mimic as closely as possible the intonation of a short utterance with a particular F0 pattern. Using computer editing of sound, the F0 pattern of the model utterance is varied in small increments between two extremes representing two different hypothesized pitch accent types. Examination of the F0 patterns produced by participants suggests a bimodal distribution, as if listeners are parsing the continuum of F0 variation into two distinct categories (cf. Redi, 2003; Dilley, 2005).
AM 方法表明语言和音乐之间存在什么样的联系?在一个非常普遍的层面上,AM 方法引起了人们对语音旋律有两个方面的想法的关注:潜在音高对比的抽象语音结构和音高与时间的物理实现(参见 Cutler 等人,1997)。这种旋律的观点非常类似于音乐旋律的概念化方式:不同人演唱的相同音乐旋律在音高轨迹的细粒度声学细节上会有所不同,但仍然会表达出相同的一组潜在音高对比。
What sorts of links between language and music does the AM approach suggest? At a very general level, the AM approach draws attention to the idea that speech melody has two facets: an abstract phonological structure of underlying pitch contrasts and a physical realization of pitch versus time (cf. Cutler et al., 1997). This view of melody is quite analogous to the way musical melody is conceptualized: The same musical melody sung by different people will differ in fine-grained acoustic details of the pitch trajectory, but will nevertheless articulate the same set of underlying pitch contrasts.
虽然这是两个领域中旋律之间的一般概念相似性,但 AM 方法还表明语音和音乐中的旋律之间存在更具体的联系点。回想一下 AM 理论的一个重要方面是,只有 F0 轮廓中的某些点被认为具有语言意义。例如,考虑图 4.5中句子的 AM 分析. 可以看出,这句话有四个音高重音(两个高音和两个下调高音,分别标记为 H* 和 !H*)和三个边音(每个 ip 末尾的 L- 音,加上一个 L整个话语末尾的 % 音调,它在影响 F0 轮廓方面与最终 ip 中的 L 音结合了力量)。回想一下,星号符号表示给定的音调与重读音节相关联。(请注意,并非所有重读音节在这句话中都会收到重音。例如,“eat”和“donuts”的第一个音节被重读但与音韵无关。这并不罕见:AM analyses of English sentences通常只在话语重读音节的一个子集上识别音调重音。)然而,尽管AM 理论仅将音高轮廓中的几个点识别为具有语言意义,听众几乎可以肯定地听到句子每个音节的音高(参见 prosogram,在第 4.3.2 节中讨论)。因此,AM 理论中内置了事件层次结构的概念,序列中的某些音高事件在结构上比其他音高事件更重要。
Although this is a general conceptual similarity between melody in the two domains, the AM approach also suggests a much more specific point of contact between melody in speech and music. Recall that an important aspect of AM theory is that only certain points in the F0 contour are considered to be linguistically significant. For example, consider the AM analysis of the sentence in Figure 4.5. As can be seen, this sentence has four pitch accents (two high tones and two downstepped high tones, labeled as H* and !H*, respectively) and three edge tones (L- tones at the end of each ip, plus a L% tone at the end of the entire utterance, which combines forces with the L- tone in the final ip in terms of influencing the F0 contour). Recall that the asterisk notation indicates that the given tone is associated with a stressed syllable. (Note that not all stressed syllables receive a pitch accent in this sentence. For example, “eat” and the first syllable of “donuts” are stressed but are not associated with a phonological tone. This is not unusual: AM analyses of English sentences often identify pitch accents on just a subset of the stressed syllables of the utterance.) Yet although AM theory identifies just a few points in a pitch contour as linguistically significant, listeners almost certainly hear pitch on each syllable of a sentence (cf. the prosogram, discussed in section 4.3.2). Hence built into AM theory is the notion of an event hierarchy, with some pitch events in a sequence being more structurally important than others.
正如我们在4.2.7 节中看到的, 事件层次的概念是现代音乐旋律感知理论的核心,在这些理论中,通常认为旋律具有一些在结构上比其他音调更重要的音调。这表明,根据相对重要的事件层次对音调序列的心理处理可能起源于对语音语调的感知。这是一个开放研究的问题。解决该问题的第一步是为语音旋律中的事件层次结构寻找经验证据。例如,可以尝试在 F0 轮廓略有不同的情况下将同一个句子呈现两次的辨别实验。感兴趣的问题是,如果给定大小的差异发生在 AM 理论表明在结构上重要的点上,是否比它发生在“插值”区域中更显着。如果这一点得到证明,并且如果它不能在简单的心理声学基础上进行解释,那么它将为探索语音旋律中的事件层次结构及其与音乐旋律结构的关系开辟道路。
As we saw in section 4.2.7, the notion of an event hierarchy is central to modern theories of melody perception in music, in which it is common to think of melodies as having some tones that are more structurally important than others. This suggests that the mental processing of a tone sequence in terms of an event hierarchy of relative importance may have its origins in the perception of speech intonation. This is a question that is open for research. A first step in addressing the issue would be to seek empirical evidence for event hierarchies in speech melody. For example, one might try discrimination experiments in which the same sentence is presented twice with a slight difference in its F0 contour. The question of interest is whether a difference of a given size is more salient if it occurs at a point that AM theory indicates is structurally important than if it occurs in an “interpolation” region. If this was demonstrated, and if it could not be explained on a simple psychoacoustic basis, it would open the way to exploring event hierarchies in speech melodies and their relationship to musical melodic structure.
我现在转向另一条关于语调的研究,该研究植根于听觉感知研究。这种方法起源于荷兰的感知研究所,或 IPO,代表该研究所荷兰语名称的首字母 ('t Hart et al., 1990)。IPO 的研究动机是希望合成具有自然发音的不同语言的句子,并通过允许完全控制合成语音中的 F0 轮廓的语音技术实现。
I now turn to a different line of research on intonation, one that has roots in the study of auditory perception. This approach originated at the Institute for Perception Research in The Netherlands, or the IPO, which represents the initials of the Dutch name of that institute (’t Hart et al., 1990). Research at the IPO was motivated by the desire to synthesize sentences with natural-sounding intonation in different languages, and was enabled by speech technology that allowed full control over F0 contours in synthetic speech.
在 IPO 方法之前,大多数语调研究都使用基于进行转录的个人耳朵的印象主义音高转录。事实上,这种方法可以追溯到约书亚斯蒂尔 1779 年的书,一篇关于建立用特殊符号表达和延续的旋律和语音尺度的论文。斯蒂尔靠耳朵工作,在低音提琴上使用滑动指法来模仿语调,以便转录它。(Steele 借用了乐谱的惯例进行转录,但他没有使用固定音高,而是使用短曲线来指示每个音节上发生的音高运动;参见 Kassler,2005。)后来的英国人开发了其他几种记谱系统和美国研究人员,使用各种符号来表示语音音调在话语过程中如何上下移动。由于技术限制,这些先驱研究人员的工作并未受益于 F0 随着时间的实际测量。自然这意味着这些符号是印象派的,并且会因人而异。
Prior to the IPO approach, most intonation research used impressionistic transcriptions of pitch based on the ears of the individual doing the transcribing. Indeed, this approach dates back to Joshua Steele’s 1779 book, An Essay Toward Establishing the Melody and Measure of Speech to Be Expressed and Perpetuated by Peculiar Symbols. Steele worked by ear, using sliding fingerings on a bass viol to mimic intonation in order to transcribe it. (Steele borrowed the convention of the musical staff for transcription, but instead of using fixed pitches, he used short curved lines to indicate the pitch movement occurring on each syllable; cf. Kassler, 2005.) Several other notation systems were developed by later British and American researchers, using a variety of symbols to indicate how voice pitched moved up and down over the course of an utterance. Because of technological limitations, these pioneering researchers worked without the benefit of actual measurements of F0 over time. Naturally this meant that the notations were impressionistic and subject to a good deal of individual variation.
导致 IPO 方法的一项重大技术进步是能够测量句子的基频轮廓,然后用原始轮廓或研究人员强加的改变轮廓重新合成句子。然后,研究人员可以测试轮廓系统变化的感知意义。这种强大的“综合分析”方法将语调研究从印象派的努力转变为定量的、基于感知的科学。例如,研究人员很快有了一个了不起的发现:可以用一个在感知上与原始轮廓等同的更简单的版本来替换原始轮廓。F0的这个“close-copy stylization”过程如图4.10所示。
A major technical advance that led to the IPO approach was the ability to measure the fundamental frequency contour of a sentence, and then to resynthesize the sentence with the original contour or with an altered contour imposed by the researcher. Researchers could then test the perceptual significance of systematic alterations in the contour. This powerful “analysis-by-synthesis” approach transformed intonation research from an impressionistic endeavor to a quantitative, perception-based science. For example, the researchers quickly made a remarkable discovery: It was possible to replace the original contour with a simpler version that was perceptually equivalent to the original. This process of “close-copy stylization” of F0 is shown in Figure 4.10.
虚线显示了英语句子的原始 F0 轮廓,而实线是紧密复制的程式化。紧密副本包含最少数量的直线段,可以生成一个在感知上与原始句子相同的句子。这种程式化过程的成功导致了听众从句子的实际 F0 曲线的细节中提取一些结构上重要的音高运动的想法。细节与感知无关的原因之一可能是它们是语言机制不可避免的副产品。例如,元音 /i/ 和 /u/ 的音高往往高于元音 /a/ 和 /ae/ (Whalen & Levitt, 1995),这可能是因为为高元音抬起舌头会导致舌骨向上紧张骨骼和声带不自觉地紧张(Ladefoged,1964:41)。还,由于声带生物力学(Löfqvist 等人,Löfqvist 等, 1989). 诸如此类的效果会产生“微调”:说话者无意的小音高偏转。原始和复制语调轮廓的感知等价性表明这些波动与听众无关。
The dotted line shows the original F0 contour of an English sentence, whereas the solid line is the close-copy stylization. The close-copy contains the minimum number of straight line segments that produces a sentence that is perceptually equal to the original. The success of this stylization procedure led to the idea that listeners extract a few structurally significant pitch movements from the details of the actual F0 curve of a sentence. One reason that the details are not relevant to perception may be that they are inevitable byproducts of the machinery of speech. For example, the vowels /i/ and /u/ tend to have higher pitch than the vowels /a/ and /ae/ (Whalen & Levitt, 1995), perhaps because raising the tongue for a high vowel causes upward tension on the hyoid bone and an involuntary tensing of the vocal folds (Ladefoged, 1964:41). Also, vowels following a voiceless consonants (such as /p/ and /f/) tend to start on a higher pitch than vowels following voiced consonants (such as /b/ and /v/), due to vocal fold biomechanics (Löfqvist et al., 1989). Effects such as these produce “microintonation”: small pitch deflections that are not intended by the speaker. The perceptual equivalence of original and close-copy intonation contours suggests that these fluctuations are not relevant to listeners.
图 4.10英语句子的 F0 轮廓程式化示例(重音音节带有下划线)。原始 F0 轮廓以点显示,而近距离复制程式化以实线显示。请注意,频率以对数标度绘制,换句话说,沿该轴的固定距离表示 F0 值之间的固定比率,而不是固定的数值差异。来自 't Hart 等人,1990。
Figure 4.10 Example of a stylization of an F0 contour for a sentence of English (accented syllables are underlined). The original F0 contour is shown with dots, and the close-copy stylization is shown with solid lines. Note that frequency is plotted on a logarithmic scale, in other words, a fixed distance along this axis represents a fixed ratio between F0 values, rather than a fixed numerical difference. From ’t Hart et al., 1990.
就目前的目的而言,IPO 方法的意义在于它表明句子的原始 F0 轮廓虽然是对语音信号的准确物理描述,但并不是人类听众感知到的语调的最准确表示。这有助于为研究“F0 程式化”开辟道路,换句话说,将 F0 轮廓转换为更简单的表示,旨在捕捉人类听众如何感知语音音高模式(参见 Rossi,1971、1978a、1978b;Harris &梅田,1987 年;爱马仕,2006 年)。正如我们将看到的,F0 风格化对于语音-音乐比较很重要。然而,在扩展这一点之前,应该注意的是,IPO 方法超越了程式化,而是寻求标准化音高运动的清单,从中可以构建自然的语音旋律。IPO 方法中的语音旋律由按顺序排列的标准化音高运动组成(参见 de Pijper,1983 年;Collier,1991 年)。这些运动的方向、大小和持续时间各不相同,从而产生了基本音调上升和下降的清单。11音高运动被用来标记重要的词和结构边界,并发生在抽象频率网格的音高水平之间,随着话语的过程而下降,如图 4.11所示(参见第 4.1.1 节对偏角的讨论).
For the current purposes, the significance of the IPO approach is that it showed that the raw F0 contour of a sentence, although an accurate physical description of the speech signal, is not the most accurate representation of intonation as it perceived by human listeners. This helped open the way to studies of “F0 stylization,” in other words, transformations of the F0 contour into simpler representations meant to capture how spoken pitch patterns are perceived by human listeners (cf. Rossi, 1971, 1978a, 1978b; Harris & Umeda, 1987; Hermes, 2006). F0 stylization is important for speech-music comparisons, as we shall see. Before expanding on this point, however, it should be noted that the IPO approach went beyond stylization to seek an inventory of standardized pitch movements from which natural-sounding speech melodies could be constructed. Speech melodies in the IPO approach consist of standardized pitch movements arranged in sequences (cf. de Pijper, 1983; Collier, 1991). The movements were differentiated by their direction, size, and duration, yielding an inventory of basic pitch rises and falls.11 Pitch movements were used to mark important words and structural boundaries, and took place between pitch levels of an abstract frequency grid that declined over the course of an utterance, as shown in Figure 4.11 (cf. section 4.1.1 for a discussion of declination).
就与音乐的比较而言,IPO 方法的一个有趣特征是在多个层次级别组织口语旋律模式。在最局部的水平上,音高运动非随机地组合成“配置”(即,某些类型的上升和下降的链接)。在下一个层次上,配置被链接在一起形成跨越单个子句的“轮廓”('Hart 等人,1990:81)。并非所有可能的配置序列都形成可接受的轮廓;相反,存在对轮廓形成的限制,并且可以以分支流程图或俯仰运动的过渡网络的形式表示。这种旋律语法方法与“曲调家族”中音乐旋律变化的原则有有趣的相似之处 (McLucas, 2001)。从这个描述中可以明显看出,
In terms of comparisons to music, an interesting feature of the IPO approach is the organization of spoken melodic patterns at multiple hierarchical levels. At the most local level, pitch movements are combined nonrandomly into “configurations” (i.e., the linking of certain kinds of rises and falls). At the next level, configurations are linked together to form “contours” spanning a single clause (’t Hart et al., 1990:81). Not all possible sequences of configurations form acceptable contours; instead, constraints on contour formation exist and can be indicated in the form of a branching flow chart or transition network of pitch movements. This approach to the grammar of melody has interesting parallels to the principles governing variation of musical melodies within “tune families” (McLucas, 2001). As is evident from this description, the IPO approach eventually became a quasi-phonological model of intonation, which specified linguistically significant intonational contrasts and their patterning in utterances.
图 4.11英式英语句子的标准化音高移动示例(重音音节带有下划线)。原始 F0 轮廓以点显示,近距离复制程式化以虚线显示,程式化运动以实线显示。程式化的运动在频率随时间下降的抽象参考线之间移动(H = 高,M = 中,L = 低)。来自 't Hart 等人,1990。
Figure 4.11 Example of a standardized pitch movements for a sentence of British English (accented syllables are underlined). The original F0 contour is shown with dots, the close-copy stylization is shown with dashed lines, and the stylized movements are shown with a solid line. The stylized movements move between abstract reference lines that decline in frequency over time (H = high, M = mid, L = low). From ’t Hart et al., 1990.
然而,从目前的角度来看,IPO 方法的重要遗产是音高程式化的概念(参见 Hermes,2006 年)。这种方法得到了其他研究人员的进一步发展。在这方面最显着的成就之一是 prosogram 模型,它在4.1.2 节中介绍(Mertens 2004a, 2004b)。该程序基于实证研究,该研究表明语音中的音调感知受三种感知转换的影响。第一个是由于语音信号中的快速频谱和振幅波动,将 F0 轮廓分离为音节大小的单元 (House, 1990)。第二个是检测音节内音调移动的阈值(“滑音阈值”;'t Hart 1976)。如果音节内的音调变化率(以半音/秒为单位)不超过此阈值,则该音节被感知为具有对应的水平音高音节内 F0 的时间整合 (d'Alessandro & Castellengo, 1994; d'Alessandro & Mertens, 1995)。12如下所述,这是比较语音和音乐的一个重要特征。第三个转换与当前目的不太相关,适用于超过滑音阈值时。这是检测音节内滑音方向变化的阈值(“差异滑音阈值”)。如果超过了不同的滑音阈值,那么单个音节就会被感知为具有不止一个音高运动。因此,例如,如果音节足够长并且其中有足够的音高变化,则单个音节可以被感知为具有音高上升然后音高下降。
From the current perspective, however, the important legacy of the IPO approach is the notion of pitch stylization (cf. Hermes, 2006). This approach has been further developed by other researchers. One of the most notable achievements in this regard is the prosogram model, which was introduced in section 4.1.2 (Mertens 2004a, 2004b). The prosogram is based on empirical research that suggests that pitch perception in speech is subject to three perceptual transformations. The first is the segregation of the F0 contour into syllable-sized units due to the rapid spectral and amplitude fluctuations in the speech signal (House, 1990). The second is a threshold for the detection of pitch movement within a syllable (the “glissando threshold”; ’t Hart 1976). If the rate of pitch change within a syllable (in semitones/second) does not exceed this threshold, then the syllable is perceived as having a level pitch corresponding to a temporal integration of F0 within the syllable (d’Alessandro & Castellengo, 1994; d’Alessandro & Mertens, 1995).12 This is a crucial feature in terms of comparing speech and music, as discussed below. The third transformation, which is less relevant for the current purposes, applies when the glissando threshold is exceeded. This is a threshold for detection of a change in the direction of a pitch glide within a syllable (the “differential glissando threshold”). If the differential glissando threshold is exceeded, then a single syllable is perceived as having more than one pitch movement. Thus for example, a single syllable can be perceived as having a pitch rise followed by a pitch fall, if the syllable is long enough and there is enough pitch change within it.
关于程序图的一个关键问题涉及它分配音调的单位。尽管感知研究表明听众将 F0 轮廓分成音节大小的单元 (House, 1990),但似乎音节起始期间的音高变化在感知中也没有发挥重要作用。这可能是因为这些区域通常是微调效果的所在地(参见 Hermes,2006)。因此,看起来音节韵母(去除其起始辅音的音节,即元音加上后续辅音)是向听众传达音调信息的主要领域。图 4.12显示了对图 4.4中描述的同一句子的基于韵律的程序图分析。
A key question about the prosogram concerns the units to which it assigns tones. Although perceptual research suggests that listeners segregate F0 contours into syllable-sized units (House, 1990), it also appears that pitch changes during syllable onsets do not play a significant role in perception. This may be because such regions are often the locus of microintonation effects (cf. Hermes, 2006). Hence it appears that the syllable rime (the syllable stripped of its onset consonants, i.e., the vowel plus following consonants) is the primary domain that conveys pitch information to a listener. Figure 4.12 shows a rime-based prosogram analysis of the same sentence depicted in Figure 4.4.
回想一下图 4.4是使用元音而不是韵母作为音素分析的单位来计算的。从比较这两个图可以看出,两个音程图中音调的音调模式非常相似。(当然,基于韵母的音调通常更长,因为它们包含更多的 F0 轮廓。)这些表示的相似性表明元音为音程图提供了合理的分析单位,特别是如果人们对音高感兴趣(与持续时间)的音调图案。这是幸运的,因为元音的识别在语音上比韵母的识别更简单(后者需要关于音节边界的语言决定)。因此在实践中,程序图计算只需要用户提供话语中元音起始和偏移的时间。13
Recall that Figure 4.4 was computed using vowels rather than rimes as the unit of prosogram analysis. As can be seen from comparing these two figures, the patterning of tones in terms of pitch is very similar in the two prosograms. (The rime-based tones are often longer, of course, because they include more of the F0 contour.) The similarity of these representations suggests that vowels provide a reasonable unit of analysis for the prosogram, especially if one is interested in the pitch (vs. durational) patterning of tones. This is fortunate, because identification of vowels is phonetically simpler than identification of rimes (which require linguistic decisions about syllable boundaries). Thus in practice, prosogram computation requires only that the user supply the timing of vowel onsets and offsets in an utterance. The prosogram then computes perceived tones according to the second and third transformations mentioned in the preceding paragraph.13
从程序图计算的细节退一步,图 4.4b 和 4.12 揭示了为什么程序图对于语音音乐比较如此有用。以音节音调表示的语调非常像音乐,因为大多数音调都是平音。从认知的角度来看,这很有趣,因为它意味着听众大脑中语音语调的听觉图像与音乐的共同点比人们普遍认为的要多。从实用的角度来看,水平音高的主导地位意味着可以使用音高模式的定量测量来比较口语和音乐旋律。这种方法为下文第 4.5.1 节中描述的口语和音乐旋律的实证比较工作提供了信息。14
Stepping back from the details of prosogram computation, Figures 4.4b and 4.12 reveal why the prosogram is so useful for speech-music comparisons. The representation of intonation in terms of syllable tones is quite music-like, because most tones are level pitches. From a cognitive perspective, this is interesting because it implies that the auditory image of speech intonation in a listener’s brain has more in common with music than has generally been believed. From a practical perspective, the dominance of level pitches means that spoken and musical melodies can be compared using quantitative measurements of pitch patterns. This approach informs the empirical comparative work on spoken and musical melody described below in section 4.5.1.14
图 4.12图 4.4中描绘的同一句子的基于韵律的程序图的插图。(图 4.4中的轴约定。)请注意此音程图的音高与图 4.4 B 中的音程图之间的相似性,这是使用基于元音的分析计算得出的。每个 rime 的时间开始和偏移由垂直虚线表示。在句子上方的音节转录中,韵母已标有下划线,并且箭头将每个韵母连接到音程图的相应区域。当程序音仅占据韵母的一部分时(例如,在第 6 个音节中,“is”),这通常是由于韵母的某些部分没有发声或信号幅度较低。
Figure 4.12 Illustration of a rime-based prosogram of the same sentence depicted in Figure 4.4. (Axis conventions as in Figure 4.4.) Note the similarity between the pitches of this prosogram and the prosogram in Figure 4.4B, which was computed using a vowel-based analysis. The temporal onset and offset of each rime is indicated by vertical dashed lines. In the syllable transcription above the sentence, rimes have been underlined, and an arrow connects each rime to the corresponding region of the prosogram. When a prosogram tone occupies only a portion of a rime (e.g., in the 6th syllable, “is”), this is typically due either to lack of voicing or low signal amplitude in some part of the rime.
为了与本书的其余部分保持一致,本章的重点是口语和器乐之间的关系。尽管如此,还是值得简单地谈一谈歌曲,因为歌词和音乐的巧妙结合为研究语音和音乐之间的旋律关系创造了一个自然的领域。关于歌曲的一个有趣问题是它的音乐旋律如何与口头歌词的音高模式相关联。这是特别相关的对于音调语言,其中音高在词义之间进行词汇区分。在这样的语言中,扭曲口语音调模式会影响可懂度,从而提出歌曲旋律在多大程度上受语音旋律限制的问题。早在 1934 年,民族音乐学家乔治·赫尔佐格 (George Herzog) 就问道:“音乐是盲目地遵循[声调]语言中的语音旋律,还是仅仅接受原始材料来发挥作用?基于语音的旋律模式与纯音乐旋律模式或倾向之间是否存在冲突?在不让听众无法理解的情况下,音乐的阐述能在多大程度上扭曲歌曲文本的语音旋律模式?” (第 454 页)。自 Herzog 时代以来,发表了许多关于声调语言中语音旋律和歌曲旋律关系的研究 (Feld & Fox, 1994, p. 31, 引用 10 项此类研究;有关示例性定量研究,请参见 Richard, 1972)。
In keeping with the rest of this book, the focus of this chapter is on the relationship between spoken language and instrumental music. Nevertheless, it is worth touching briefly on song, because the artful combination of words and music creates a natural area for the study of melodic relations between speech and music. An interesting question about song is how its musical melody relates to the pitch pattern of the spoken lyrics. This is particularly relevant for tone languages, in which pitch makes lexical distinctions between word meanings. In such languages, distorting the spoken pitch pattern can affect intelligibility, raising the question of to what extent song melodies are constrained by the melody of speech. As far back as 1934, the ethnomusicologist George Herzog asked, “Does music slavishly follow speech-melody in [tone] languages, or does it merely accept raw material with which to work? Is there a clash between melodic patterns based on speech, and purely musical melodic patterns or tendencies? And how far can musical elaboration distort the speech-melody pattern of the song-text without making it incomprehensible to the listener?” (p. 454). Since Herzog’s time, many studies have been published on the relation of speech melody and song melody in tone languages (Feld & Fox, 1994, p. 31, cite 10 such studies; for an exemplary quantitative study, see Richard, 1972).
粤剧中出现了语音和歌曲旋律之间强烈对应的例子(Yung,1991)。南部沿海粤语有九个声调,包括五个平调和四个等高声调(参见第 2 章有关这些类型的语言声调的定义)。容指出,口语文本的轮廓与其相关的音乐旋律之间的密切对应是该类型的特征。然而,他也指出,在中国的其他地方,没有观察到文本和音乐之间如此紧密的匹配,这表明关键因素不是可懂度。相反,他认为密切的对应关系源于实际问题。唱粤剧的人往往背戏的速度很快,经常要连续几天每天唱不同的戏。通常,他们会在演出前几天收到一份剧本,并且必须在没有乐谱或排练的情况下演出。在这种情况下,容建议歌手边表演边作曲,使用“相对不太明确的语言音高作为指导……。. . 在创建一系列定义明确的音高以形成音乐的旋律线时”(第 416-417 页)。因此,就台词旋律对歌曲的强烈影响而言,粤剧的情况很可能是不寻常的。
A case of strong correspondence between speech and song melody occurs in Cantonese opera (Yung, 1991). Southern coastal Cantonese has nine linguistic tones, consisting of five level tones and four contour tones (cf. Chapter 2 for a definition of these types of linguistic tones). Yung notes that close correspondence between the contour of a spoken text and its associated musical melody is characteristic of the genre. However, he also notes that in other parts of China such close matching between text and music is not observed, suggesting that the key factor is not intelligibility. Instead, he argues that the close correspondence springs from practical concerns. Singers of Cantonese opera often must memorize operas very quickly, and often have to sing different operas each day for several days. Typically, they are given a script a few days before a performance, and must perform without music notation or rehearsal. In this situation, Yung suggests that singers compose as they perform, using “the relatively less defined pitches of linguistic tones as a guide . . . in creating a series of well-defined pitches to form the melodic line of the music” (pp. 416–417). Thus the case of Cantonese opera is likely to be unusual in terms of the strong influence of speech melody on song.
Herzog (1934) 报告了一个更具代表性的案例,他研究了 Jabo 的歌曲,Jabo 是利比里亚的一种声调语言,有四个不连续的级别。他发现了歌曲旋律的形状与底层词的语言声调模式之间存在一定的关系,但他也发现这种关系不是一种奴役关系。给定的语言声调与给定的音乐声调没有一对一的对应关系,并且诸如动机相似性和乐句结构平衡等音乐考虑常常导致音乐与语音旋律所暗示的形状相矛盾。据推测,这些因素保留了足够多的语音旋律,以便从上下文中清楚地了解单词的含义。因此,声调语言中语音旋律的唯一强大预测能力可能是暗示歌曲旋律将不是:也就是说,它很少有与口头旋律完全相反的轮廓(参见 Blacking, 1967; Wong & Diehl, 2002)。
A more representative case is reported by Herzog (1934), who studied songs in Jabo, a tone language from Liberia with four discrete levels. He found a definite relationship between the shape of song melodies and the pattern of linguistic tones on the underlying words, but he also found that the relationship was not a slavish one. A given linguistic tone did not have a one-to-one correspondence with a given musical tone, and musical considerations such as motivic similarity and balancing of phrase structure often led the music to contradict the shape suggested by the speech melody. Presumably, these factors left enough of the speech melody intact so that the meanings of words were clear from context. Thus it may be that the only strong predictive power of speech melody in tone languages is in suggesting what a song melody will not be: namely, it will rarely have a contour completely contrary to that of the spoken melody (cf. Blacking, 1967; Wong & Diehl, 2002).
与声调语言相比,非声调语言中口头和音乐旋律关系的研究远没有那么发达。在一项研究中,Arnold 和 Jusczyk (2002) 检查了三种童谣的口语和演唱版本。口头版本是由不熟悉通常伴随他们的音乐旋律的人制作的,他们将它们阅读为相互关联的散文文本(而不是组织成对联的诗歌)。为了进行跨域比较,我们简单地根据高音和低音目标分析了音乐音高轮廓,换句话说,歌曲的 F0 音轨中与重读音节对齐的最大值和最小值。根据 ToBI 分析系统对语音轮廓进行分析,以识别高音和低音(参见第 4.3.1 节)). 作者发现,演唱版本中的音高目标确实倾向于在类型(高或低)上与文本中相同单词所实现的目标相对应,尽管对应的数量在三个童谣中差异很大。显然,这是一个开放的研究领域。量化歌曲中语言语调和音乐旋律之间对应程度的一种想法是使用音域图来识别说出歌词时的旋律轮廓,然后将此轮廓与文本所对应的音乐旋律轮廓进行比较设置在一首歌中。了解旋律与口语音高轮廓一致的歌曲是否在学习或记忆的难易程度方面具有优势将是一件很有趣的事情(参见 Patel,2006a)。
Research on the relation of spoken and musical melodies in non-tone languages is much less developed than in tone languages. In one study, Arnold and Jusczyk (2002) examined spoken and sung versions of three nursery rhymes. The spoken versions were made by individuals who were unfamiliar with the musical melodies that normally accompanied them, and who read them as texts of connected prose (rather than as verse organized into couplets). To conduct cross-domain comparison, musical pitch contours were analyzed simply in terms of high and low pitch targets, in other words, maxima and minima in the song’s F0 track that were aligned with stressed syllables. Speech contours were analyzed in terms of the ToBI system of analysis to identify high and low tones (cf. section 4.3.1). The authors found that pitch targets in the sung version did tend to correspond in type (high or low) to those realized on the same words in the text, though the amount of correspondence varied widely across the three nursery rhymes. Clearly, this is an area that is open for research. One idea for quantifying the degree of correspondence between linguistic intonation and musical melody in songs is to use the prosogram to identify the melodic contour of the lyrics when they are spoken, and then to compare this contour to the musical melodic contour to which the text is set in a song. It would be interesting to know if songs with melodies congruent with the spoken pitch contour have an advantage in terms of how easily they are learned or remembered (cf. Patel, 2006a).
本节的目的是深入研究两个领域,在这两个领域中,音乐和语言旋律的直接比较被证明是富有成果的。第 4.5.1 节介绍结构,并表明语音语调统计的一个特定方面反映在器乐中。第 4.5.2 节涉及处理,并提供证据表明对语音和音乐中旋律轮廓的感知涉及重叠的认知和神经机制。后一节还概述了一个假设(“旋律轮廓耳聋”假设),旨在解释音乐音盲个体中音乐和语言旋律感知的明显分离。正如我们将要看到的,这一假设的一个含义是行为分离与神经分离不同。
The purpose of this section is to delve into two areas in which direct comparisons of musical and linguistic melody are proving fruitful. Section 4.5.1 addresses structure, and shows that a specific aspect of the statistics of speech intonation is reflected in instrumental music. Section 4.5.2 concerns processing, and provides evidence that perception of melodic contour in speech and music engages overlapping cognitive and neural machinery. This latter section also outlines a hypothesis (the “melodic contour deafness” hypothesis) that aims to account for the apparent dissociation of musical and linguistic melody perception in musically tone-deaf individuals. As we shall see, one implication of this hypothesis is that a behavioral dissociation is not the same as a neural dissociation.
根据音程图的音节音调分析语调(第 4.3.2 节)为语音和音乐中旋律的定量比较开辟了道路。例如,音乐旋律的特征在于许多统计跨文化出现的规律性(总结于 Huron,2006 年)。这些规律在多大程度上是音乐独有的,而不是与语音共享的?如果某些规律是共享的,那么这个事实的认知意义是什么?
Analysis of intonation in terms of the syllable tones of the prosogram (section 4.3.2) opens the way to quantitative comparisons of melody in speech and music. For example, musical melodies are characterized by a number of statistical regularities that appear cross-culturally (summarized in Huron, 2006). To what extent are these regularities unique to music versus shared with speech? If some regularities are shared, what is the cognitive significance of this fact?
音乐旋律的一个众所周知的规律是连续音高之间的小(“连续”)音程占主导地位。图 4.13显示了西方音乐样本中不同大小音程的相对频率(Vos & Troost,1989)。一般模式类似于对来自一系列文化的旋律进行采样时看到的模式(休伦,2006:74)。
One well-known regularity of musical melodies is the predominance of small (“conjunct”) intervals between successive pitches. Figure 4.13 shows the relative frequency of intervals of different sizes in a sample of Western music (Vos & Troost, 1989). The general pattern is similar to that seen when melodies from a range of cultures are sampled (Huron, 2006:74).
从图中可以清楚地看出,音乐旋律以小音程为主。另一种表达方式是,大多数旋律运动都是小步(2 个半音或更小)。小音程的优势反映在听众对中途停止时新旋律将如何继续的期望中(参见第 4.2.4 节),表明这是旋律的感知相关特征。
As is clear from the figure, musical melodies are dominated by small intervals. Another way of putting this is that most melodic motion is by small steps (of 2 semitones or less). The predominance of small intervals is reflected in listeners’ expectations for how novel melodies will continue when stopped midstream (cf. section 4.2.4), showing that this is a perceptually relevant feature of melodies.
为什么小音程主导旋律?最常见的两个原因是运动和知觉。运动解释表明,小音程比大音程更容易连续产生,用声音和大多数乐器(参见 Zipf,1949:336-337)。感知解释表明,过多的大音高运动可能会将旋律分成单独的感知流,从而破坏连续音调之间的感知凝聚力(Bregman,1990;参见 Narmour,1990)。当然,运动和知觉解释并不相互排斥:事实上,它们可能会相互加强。因此,我们可以根据对生产和感知的内在约束的概念,将它们归为一类“基于约束的解释”。
Why do small intervals dominate melodies? The two most commonly proffered reasons are motor and perceptual. The motor explanation suggests that small intervals are easier to produce in succession than large ones, with the voice and with most instruments (cf. Zipf, 1949:336–337). The perceptual explanation suggests that too many large pitch movements risk splitting a melody into separate perceptual streams, destroying the perceptual cohesion between successive tones (Bregman, 1990; cf. Narmour, 1990). Of course, the motor and perceptual explanations are not mutually exclusive: Indeed, they may reinforce each other. One can thus group them into a single category of “constraint-based explanations” based on the notion of intrinsic constraints on production and perception.
图 4.13直方图显示了西方旋律中不同大小的音高间隔的相对比例(古典和摇滚:白色小节;民谣,黑色小节)。来自 Vos & Troost,1989。
Figure 4.13 Histograms showing the relative proportion of pitch intervals of different sizes in Western melodies (classical and rock: white bars; folk, dark bars). From Vos & Troost, 1989.
基于约束的理论的另一种想法是,对音乐中小间隔的偏好源于语音经验。也就是说,如果语音旋律由小的音高运动主导,并且如果听众在他们的环境中吸收口语音高模式的统计数据,那么这可能会影响他们塑造音乐旋律的倾向。该假设调用了一种称为“统计学习”的感知机制,该机制已在第 2 章和第 3 章中进行了讨论。统计学习指的是跟踪环境中的模式并获取其统计属性的隐式知识,而无需任何直接反馈。本章稍后将更详细地讨论统计学习。目前,
An alternative to constraint-based theories is the idea is that the preference for small intervals in music arises out of experience with speech. That is, if speech melodies are dominated by small pitch movements, and if listeners absorb the statistics of spoken pitch patterns in their environment, then this might influence their proclivities in terms of shaping musical melodies. This hypothesis invokes a perceptual mechanism known as “statistical learning,” which has been discussed in Chapters 2 and 3. Statistical learning refers to tracking patterns in the environment and acquiring implicit knowledge of their statistical properties, without any direct feedback. Statistical learning is discussed in more detail later in this chapter. For now, the relevant point is that statistical learning may provide one mechanism for a cross-domain influence of speech on music.
如何检验这一想法?第一步是确定语音旋律是否实际上由小音调运动主导。再次考虑图4.4b 中的程序图。表 4.2列出了相邻电平音之间的间隔。(请注意,图4.4b 的音程图中有 18 个音调,但表 4.2中只出现 15 个音程。这是因为音程图中的一个音程是滑音,音程仅在紧邻的电平音调之间计算。)一个从这张表中可以立即看出,大多数音程都非常小,让人联想到音乐中的模式。这种模式的代表性如何?图 4.14显示了一个包含 40 个英语和法语句子的语料库中平调之间间隔的直方图,这些句子已使用图 4.4b 中的程序图进行了分析。
How can one test this idea? The first step is to determine if speech melodies are in fact dominated by small pitch movements. Consider once again the prosogram in Figure 4.4b. The intervals between adjacent level tones are listed in Table 4.2. (Note that that there are 18 tones in the prosogram of Figure 4.4b, but only 15 intervals appear in Table 4.2. This is because one of the prosogram tones is a glide, and intervals are computed only between immediately adjacent level tones.) One fact that is immediately apparent from this table is that most intervals are quite small, reminiscent of the pattern found in music. How representative is this pattern? Figure 4.14 shows a histogram of intervals between level tones in a corpus of 40 English and French sentences that have been analyzed using prosograms like that in Figure 4.4b.
可以看出,语调音高模式由小音高运动主导。在图 4.14中,56% 的平调音高之间的所有音程小于或等于 2 个半音。语音和音乐的音程分布之间的一个显着差异是,在音乐旋律中,2 个半音的音程比 1 或 0 个半音的音程更常见(见图 4.13)。言语中没有这种倾向。(在图 4.14中,0-1、1-2 和 2-3 半音之间的音程百分比分别为 35%、21% 和 19%)。2 半音音程在音乐旋律中的优势可能反映了这样一个事实,即 2 个半音音程在音阶中比 1 个半音音程更常见(参见第 2 章)). 尽管音乐和语音之间存在这种差异,但更重要的一点是,语音旋律与音乐旋律一样,以小音高间隔为主。这可能不是英语和法语的特点;它可能适用于一般语言。
As can be seen, intonational pitch patterns are dominated by small pitch movements. In Figure 4.14, 56% of all intervals between level tone pitches are less than or equal to 2 semitones. One notable difference between the distribution of intervals in speech and music is that in musical melodies intervals of 2 semitones are more common than intervals of 1 or 0 semitones (see Figure 4.13). No such tendency is seen in speech. (In Figure 4.14, the percentage of intervals between 0-1, 1-2, and 2-3 semitones, respectively, are 35%, 21%, and 19%). This predominance of 2-semitone intervals in musical melodies likely reflects the fact that intervals of 2 semitones are more common than intervals of 1 semitone in musical scales (cf. Chapter 2). Despite this difference between music and speech, the larger point is that speech melodies, like musical melodies, are dominated by small pitch intervals. This is probably not a peculiarity of English and French; it is likely to hold for languages generally.
T$able 4.2英语句子程序图中相邻声级之间半音音高间隔的大小(参见图 4.4b)
T$able 4.2 The Size of Pitch Intervals in Semitones Between Adjacent Level Tones in a Prosogram of an English Sentence (cf. Figure 4.4b)
发现小音程在语音旋律中占主导地位与音乐反映语音的观点是一致的。然而,这也与语音和音乐中的音调模式受同一组运动和感知约束的想法一致。这说明了解释音乐和语音统计规律之间共性的一个基本问题:这种共性可能是由于某些影响语音和音乐的生理或感知因素的运作,而不是任何跨域影响。
Finding that small intervals predominate in speech melodies is consistent with the idea that music reflects speech. However, it is also consistent with the idea that pitch patterns in speech and music are subject to the same set of motor and perceptual constraints. This illustrates a basic problem in interpreting commonalities between statistical regularities in music and speech: Such commonalities could be due to the operation of some physiological or perceptual factor that influences both speech and music, rather than to any cross-domain influence.
因此,如果有人对语音影响音乐的可能性感兴趣,则需要采取一些策略来消除常见外部影响的问题。其中一种策略是寻找不同文化中语音语调统计数据之间的可量化差异,并查看这些差异是否反映在这些文化的器乐中。由于一般运动或感知因素无法解释语言之间语音旋律的差异,因此音乐中语言差异的反映为跨领域影响提供了更有力的理由。
Hence if one is interested in the possibility of speech influencing music, some strategy is needed to eliminate the problem of common external influences. One such strategy is to look for quantifiable differences between the statistics of speech intonation in different cultures, and to see if these differences are reflected in instrumental music of these cultures. Because there is no way that general motor or perceptual factors can explain differences in speech melody between languages, a reflection of linguistic differences in music makes a stronger case for cross-domain influence.
图 4.14直方图显示了英国英语和法语句子的音程图中不同大小的音高间隔(总共 n = 543)之间的相对比例。直方图 bin 大小 = 0.5 st。
Figure 4.14 Histogram showing the relative proportion of pitch intervals of different sizes (n = 543 total) between level tones in prosograms of British English and French sentences. Histogram bin size = 0.5 st.
这种比较语言和音乐的“基于差异”的方法已被 Patel、Iversen 和 Rosenberg(2006 年)应用于英国和法国的演讲和器乐古典音乐。这项研究用类似的方法补充了早期的研究,该方法侧重于节奏(Patel 和 Daniele,2003a;参见第 3 章,第 3.5.1 节)。再一次,调查的主题是一个具有挑衅性的主张(过去 50 年由许多音乐学家和语言学家提出),即一种文化的器乐反映了其母语的韵律。例如,语言学家霍尔(1953 年)提出埃尔加的音乐与英式英语的语调相似。
This “difference-based” approach to comparing language and music has been applied by Patel, Iversen, and Rosenberg (2006) to the speech and instrumental classical music of England and France. This study complemented an earlier study with a similar approach, which focused on rhythm (Patel & Daniele, 2003a; cf. Chapter 3, section 3.5.1). Once again, the topic of investigation was the provocative claim (made by a number of musicologists and linguists over the past 50 years) that a culture’s instrumental music reflects the prosody of its native language. For example, the linguist Hall (1953) suggested a resemblance between Elgar’s music and the intonation of British English.
在解决这个想法时,我们使用了与我们早期关于节奏的工作相同的语音和音乐语料库。也就是说,语音语料库由以每种语言为母语的人阅读的简短的、类似新闻的话语组成,而音乐语料库由 16 位作曲家的器乐古典音乐主题组成,他们的生活跨越 20 世纪之交(例如埃尔加和德彪西) . 虽然我们只关注两种文化,但这项研究的主要目标是建立在比较语音音乐研究中具有广泛适用性的方法。
In addressing this idea, we used the same corpus of speech and music as in our earlier work on rhythm. That is, the speech corpus consisted of short, newslike utterances read by native speakers of each language, whereas the musical corpus consisted of themes from instrumental classical music by 16 composers whose lives spanned the turn of the 20th century (such as Elgar and Debussy). Although we focused on just two cultures, a principal goal of this research was to establish methods of broad applicability in comparative speech-music studies.
我们计算了语音语料库中所有句子的程序图,得到了如图4.4b 所示的表示。然后,仅关注水平音高(约占 prosogram 算法分配的音高的 97%,其余为滑音),我们为每个句子计算了两种旋律统计量度。第一个是音高的可变性,它简单地衡量了水平音高在其平均音高周围的分布范围。第二个是句子中音高间隔大小的可变性,其中间隔被定义为紧接的连续水平音高之间的音高跳跃。此度量用于指示连续级别音高之间的步长是否趋向于更均匀或更可变的大小。
We computed prosograms for all sentences in our speech corpus, resulting in representations like that in Figure 4.4b. Then, focusing on just the level pitches (which represented about 97% of pitches assigned by the prosogram algorithm, the rest being glides), we computed two measures of melodic statistics for each sentence. The first was variability in pitch height, which simply measured how widely spread the level pitches were around their mean pitch. The second was variability in pitch interval size within a sentence, in which an interval was defined as the jump in pitch between immediately successive level pitches. This measure serves to indicate whether steps between successive level pitches tend to be more uniform or more variable in size.
最初,关注语音中的音高间隔似乎很奇怪。尽管人类感知系统对音乐中的音程模式非常敏感(例如,只要保留其音程模式,就可以在任何移调中识别旋律),音乐具有明确定义的音程类别,例如小二度和完美的五分之一,而语音没有(参见第 4.1.1 节). 尽管缺乏稳定的音程结构,感知系统是否会关注语音中的音程模式?最近在语调音位学方面的理论和实证研究表明,口语音高间隔实际上可能在语调感知中很重要,即使它们不遵守固定的频率比(Dilley,2005 年;参见 Hermes,2006 年)。如果是这种情况,那么人们就可以理解为什么作曲家可能对他们语言的“口语间隔统计”有隐含的了解,这可能反过来反映在他们创作的音乐中。
Initially it may seem odd to focus on pitch intervals in speech. Although the human perceptual system is quite sensitive to interval patterns in music (in which, for example, a melody can be recognized in any transposition as long as its interval pattern is preserved), music features well-defined interval categories such as minor second and perfect fifth, whereas speech does not (cf. section 4.1.1). Might the perceptual system attend to interval patterns in speech despite the lack of stable interval structure? Recent theoretical and empirical work in intonational phonology suggests that spoken pitch intervals may in fact be important in the perception of intonation, even if they do not adhere to fixed frequency ratios (Dilley, 2005; cf. Hermes, 2006). If this is the case, then one can understand why composers might have implicit knowledge of the “spoken interval statistics” of their language, which might in turn be reflected in the music that they write.
我们在测量语音中的音调高度和音调间隔时使用了半音,反映了语调的感知比例(Nolan,2003)。15我们还使用半音来测量音乐中的音高和音程,直接从音乐符号开始工作。身高测量结果显示,英语和法语演讲之间的变异性没有显着差异,英语和法语音乐之间也没有显着差异。间隔测量的结果更具启发性。法语口语的音高间隔变异性明显低于英语口语,音乐也反映了这种模式。也就是说,法语音乐主题的音程可变性明显低于英语主题。换句话说,当语音在语音中从一个音节移动到下一个音节时,法语中每个音高移动的大小比英语中更均匀。同样,当旋律从一个音符移动到下一个音符时,法语音乐的音高移动幅度比英语音乐更均匀。图 4.15显示了音高间隔可变性测量的结果与我们之前关于节奏的发现相结合。请注意,垂直轴上的可变性度量是“旋律音程可变性”(MIV),定义为音程大小的变异系数 (CV) 的 100 倍(CV = 标准偏差/平均值)。将 CV 缩放 100 可使 MIV 处于与 nPVI 相同的一般绝对值范围内。
We used semitones in our measurements of pitch height and pitch intervals in speech, reflecting the perceptual scaling of intonation (Nolan, 2003).15 We also used semitones in measuring pitch height and pitch intervals in music, working directly from music notation. The height measurements revealed no significant difference in variability between English and French speech, nor between English and French music. The results of interval measurements were more revealing. Spoken French had significantly lower pitch interval variability than spoken English, and music mirrored this pattern. That is, French musical themes had significantly lower interval variability than English themes, Put another way, as the voice moves from one syllable to the next in speech, the size of each pitch movement is more uniform in French than in English speech. Similarly, as a melody moves from one note to the next, the size of the pitch movement is more uniform in French than in English music. Figure 4.15 shows the results of the pitch interval variability measures combined with our previous findings on rhythm. Note that the measure of variability on the vertical axis is “melodic interval variability” (MIV), defined as 100 times the coefficient of variation (CV) of interval size (CV = standard deviation/mean). Scaling the CV by 100 serves to put MIV in the same general range of absolute values as nPVI.
我们称图 4.15为“RM 空间”(节奏-旋律空间)图,因为它显示了给定语言或音乐在二维空间中的位置,节奏在一个轴上,旋律在另一个轴上。RM 空间的一个吸引人的地方是两种语言之间的“韵律距离”可以通过连接表示两种语言的平均值 (nPVI, MIV) 的点的线的长度来量化(详见 Patel 等人,2006 年) ). 在图 4.15中,英语和法语语音之间的韵律距离为 27.7 RM 单位。相比之下,连接英语和法语音乐的线路只有大约 1/3 长(8.5 RM 单位)。因此,音乐差异小于语言差异。这并不奇怪,因为音乐是一种具有大量文化内变异的艺术尝试,并且(与语言不同)没有先验的理由来遵循节奏或旋律规范。值得注意的是,尽管存在这种差异,但反映语言差异的两国音乐之间出现了数量上的差异。
We call Figure 4.15 an “RM space” (rhythm-melody space) plot, because it shows the position of a given language or music in a two-dimensional space with rhythm on one axis and melody on the other. One appealing thing about RM space is that the “prosodic distance” between two languages can be quantified via the length of a line connecting the points representing the mean (nPVI, MIV) of the two languages (see Patel et al., 2006 for details). In Figure 4.15, the prosodic distance between English and French speech is 27.7 RM units. In contrast, the line connecting the English and French music is only about 1/3 as long (8.5 RM units). Thus the musical difference is smaller than the linguistic one. This is not surprising, given that music is an artistic endeavor with substantial intracultural variation and (unlike speech) no a priori reason to follow rhythmic or melodic norms. What is remarkable is that despite this variation, quantitative differences emerge between the music of the two nations that reflect linguistic differences.
语音模式通过什么途径进入音乐?一个经常听到的建议是作曲家从民间音乐中借用曲调,这些曲调带有语言韵律的印记,因为它们是用文字写成的。这可以称为从语音到音乐的“间接路径”。如果是这样的话,那么语音-音乐关系就没有很强的认知意义,因为语音-音乐的相似性只是借用有意识地塑造以适合语言短语的音乐的结果。
By what route do speech patterns find their way into music? One oft-heard proposal is that composers borrow tunes from folk music, and these tunes bear the stamp of linguistic prosody because they were written with words. This might be termed the “indirect route” from speech to music. If this is the case, then there are no strong cognitive implications for speech-music relations, because the speech-music resemblance is simply the result of borrowing of music that has been consciously shaped to fit linguistic phrases.
图 4.15语音和音乐的节奏旋律 (RM) 空间。误差线显示 +/– 1 个标准误差。详情见正文。
Figure 4.15 Rhythm-melody (RM) space for speech and music. Error bars show +/– 1 standard error. See text for details.
然而,我们赞成一个不同的假设。这是一种观点,即在一个领域(普通语音)中对韵律模式的内隐学习会影响另一个领域(器乐艺术音乐)中节奏和音调模式的创建。这可能被称为语音和音乐之间的“直接路径”,因为语音和音乐之间的联系不需要通过中间的语音-音乐混合(例如歌曲)来调节。直接路径假说的一个优点是它可以解释音乐中语音的反映,这些音乐并未被认为特别受民间音乐的影响(例如,德彪西和埃尔加的大部分作品;Grout & Palisca,2000)。
We favor a different hypothesis, however. This is the idea that implicit learning of prosodic patterns in one domain (ordinary speech) influences the creation of rhythmic and tonal patterns in another domain (instrumental art music). This might be termed the “direct route” between speech and music, because contact between speech and music need not be mediated by an intermediate speech-music blend (such as songs). One advantage of the direct-route hypothesis is that it can account for the reflection of speech in music not thought to be particularly influenced by folk music (e.g., much of Debussy’s and Elgar’s work; Grout & Palisca, 2000).
直接路径假设以母语韵律模式的统计学习概念为中心。回想一下,统计学习是指在没有任何直接反馈的情况下跟踪环境中的模式并获取其统计属性的隐式知识。对母语韵律模式的统计学习可能在生命早期就开始了。听觉发展研究表明,婴儿擅长统计学习语音中的语音/音节模式和非语言音调序列中的音调模式(Saffran 等人,1996 年,1999 年)学习语音中的节奏和音调模式也将从婴儿期开始,尤其是因为众所周知婴儿对语言的韵律模式非常敏感(Nazzi 等人,1998 年;Ramus,2002 年 b)。当然,音调模式的统计学习不必局限于婴儿期。成人听众对音乐中不同音高的分布和音程模式表现出敏感性(Oram & Cuddy,1995 年;Saffran 等人,1999 年;Krumhansl,2000 年;Krumhansl 等人,1999 年,2000 年)。重要的是,音乐中的统计学习可以发生在无调性或文化上不熟悉的材料上,这意味着它并不局限于遵循熟悉的音乐惯例的音调模式。
The direct-route hypothesis centers on the notion of statistical learning of prosodic patterns in the native language. Recall that statistical learning refers to tracking patterns in the environment and acquiring implicit knowledge of their statistical properties, without any direct feedback. Statistical learning of prosodic patterns in one’s native language likely begins early in life. Research on auditory development has shown that infants are adept at statistical learning of phonetic/syllabic patterns in speech and of pitch patterns in nonlinguistic tone sequences (Saffran et al., 1996, 1999) Thus it seems plausible that statistical learning of rhythmic and tonal patterns in speech would also begin in infancy, especially because infants are known to be quite sensitive to the prosodic patterns of language (Nazzi et al., 1998; Ramus, 2002b). Of course, statistical learning of tone patterns need not be confined to infancy. Adult listeners show sensitivity to the distribution of different pitches and to interval patterns in music (Oram & Cuddy, 1995; Saffran et al., 1999; Krumhansl, 2000; Krumhansl et al., 1999, 2000). Importantly, statistical learning in music can occur with atonal or culturally unfamiliar materials, meaning that it is not confined to tone patterns that follow familiar musical conventions.
值得强调的是,直接路径假设并不意味着语音韵律以一种确定性的方式影响音乐结构。它只是暗示作曲家会受到他们对母语节奏和旋律的内隐知识的影响。毕竟,音乐是一种艺术媒介,作曲家可以随心所欲地创作。特别是,其他文化音乐的影响可能会超越本土语言对音乐结构的影响(参见 Patel 和 Daniele,2003b;Daniele 和 Patel,2004)。
It is worth emphasizing that the direct-route hypothesis does not imply that speech prosody influences musical structure in a deterministic fashion. It simply implies that composers can be influenced by their implicit knowledge of their native language’s rhythm and melody. Music is an artistic medium, after all, and composers are free to do what they like. In particular, the influence of music of other cultures may override native linguistic influences on musical structure (see Patel & Daniele, 2003b; Daniele & Patel, 2004).
最后两点值得一提。首先,还有待解释的是,为什么英语演讲的音高间隔变化程度应该比法语大。一种想法是,英式英语可能在其语调系统中使用三个音系上不同的音高水平,而法语可能只使用两个(cf. Willems, 1982; Ladd & Morton, 1997; cf. Figure 4.11一种基于三个不同音高水平的英语语调模型)。然而,令人信服的解释有待于未来的研究。其次,我们的研究集中在旋律统计的一个非常简单的方面(音高可变性)。显然需要对旋律统计进行更复杂的分析。一个想法是研究音调和持续时间模式的对齐,看看不同语言之间语音旋律的“联合重音结构”的差异是否反映在音乐中(参见第 4.2.9 节和 Patel 等人,2006 年)。
Two final points deserve mention. First, it remains to be explained why English speech should have a greater degree of pitch interval variability than French. One idea is that British English may use three phonologically distinct pitch levels in its intonation system, whereas French may only use two (cf. Willems, 1982; Ladd & Morton, 1997; cf. Figure 4.11 for one model of English intonation based on three distinct pitch levels). A compelling explanation, however, awaits future research. Second, our study focused on a very simple aspect of melodic statistics (pitch variability). More sophisticated analyses of melodic statistics are clearly called for. One idea is to study the alignment of pitch and duration patterns, to see if differences in the “joint accent structure” of speech melody between languages are reflected in music (cf. section 4.2.9, and Patel et al., 2006).
旋律轮廓是指旋律的音调随时间起伏的模式,而不考虑确切的音程大小。如第 4.2.3 节所述,旋律轮廓感知在音乐和语音中都很重要。然而,这两个领域中旋律轮廓的处理是否由共同的认知和神经机制介导,在先验基础上并不明显。人们可能会认为,语音中缺乏稳定的音高间隔结构,以及语音特有的音高下降趋势(参见第 4.1.1 节)) 将导致语言旋律轮廓的处理方式不同于音乐旋律轮廓。事实上,Peretz 和 Coltheart (2003) 提出了一种音乐处理的模块化模型,其中旋律轮廓分析是音乐感知的特定领域方面,不与语音共享。因此,旋律轮廓感知的跨领域研究与关于大脑中音乐和语言处理重叠的争论有关。
Melodic contour refers to a melody’s pattern of ups and downs of pitch over time, without regard to exact interval size. As discussed in section 4.2.3, melodic contour perception is important in both music and speech. Yet it is not obvious on a priori grounds whether the processing of melodic contour in the two domains is mediated by common cognitive and neural machinery. One might imagine that the lack of stable pitch interval structure in speech, combined with a speech-specific tendency for pitch declination (cf. section 4.1.1) would lead linguistic melodic contours to be processed differently from musical melodic contours. Indeed, Peretz and Coltheart (2003) have proposed a modular model of music processing in which melodic contour analysis is a domain-specific aspect of music perception, not shared with speech. Thus cross-domain studies of melodic contour perception are relevant to debates on the overlap of musical and linguistic processing in the brain.
就来自认知神经科学的证据而言,关于音乐选择性障碍或“选择性失乐”的报告与旋律轮廓处理的跨领域研究特别相关。以下两节讨论了两个人群的旋律轮廓处理:具有“后天性失乐症”的个体和具有“先天性失乐症”或音乐性耳聋的个体。
In terms of evidence from cognitive neuroscience, reports of selective impairments of music, or “selective amusia,” are of particular relevance for the cross-domain study of melodic contour processing. The two sections below discuss melodic contour processing in two populations: individuals with “acquired amusia” and those with “congenital amusia,” or musical tone-deafness.
后天性失乐症是指脑损伤后音乐感知和/或产生能力的缺陷,而不仅仅是由于听力损失或其他一些周围听觉障碍(Marin & Perry, 1999; Peretz, 2006)。这种情况的报道相对较少,可能是由于社会因素。也就是说,在脑损伤后经历音乐感知显着变化的人可能不会就此问题寻求医疗救助,或者他们的医生可能不会将其引向研究音乐的神经心理学家。从大脑定位的角度来看,失乐与大脑不同区域(不仅仅是听觉皮层)的损伤有关,尽管在已发表的文献中右半球损伤的病例占优势(Griffiths,2002;Stewart 等人)等人,2006 年)。
Acquired amusia refers to deficits in musical perception and/or production abilities following brain damage that are not simply due to hearing loss or some other peripheral auditory disorder (Marin & Perry, 1999; Peretz, 2006). The condition is reported relatively rarely, probably due to social factors. That is, individuals who experience dramatic changes in music perception after brain damage may not seek medical attention for this problem, or may not be directed by their physicians toward a neuropsychologist who studies music. From the standpoint of brain localization, amusia has been associated with damage to diverse regions of the brain (not just the auditory cortices), though there are a preponderance of cases with right-hemisphere damage in the published literature (Griffiths, 2002; Stewart et al., 2006).
帕特尔、佩雷茨等人。(1998) 检查了两个具有不同音乐感知缺陷的音乐个体的语调感知。第一位参与者 CN 是一位“联想音乐家”,他能够区分音高和节奏模式,但无法识别文化上熟悉的曲调,这表明在访问存储的熟悉曲调表示时存在选择性困难 (Peretz, 1996)。第二位参与者 IR 是一位“敏锐的音乐家”,无法区分音高和节奏模式。因此,她的赤字水平低于 CN。
Patel, Peretz, et al. (1998) examined intonation perception in two amusic individuals with different kinds of music perception deficits. The first participant, CN, was an “associative amusic” who was able to discriminate musical pitch and rhythm patterns but was unable to identify culturally familiar tunes, suggesting a selective difficulty in accessing stored representations for familiar tunes (Peretz, 1996). The second participant, IR, was an “apperceptive amusic” who could not discriminate musical pitch and rhythm patterns. Thus her deficit was at a lower level than that of CN.
为了测试这些人的语调感知,Patel、Peretz 等人。(1998) 使用句子对,其中句子对的成员在词汇上相同但语调不同。此外,每个句子对的计算机编辑被用来确保两个句子中的音节时间相同并且强度差异最小化,因此唯一显着的辨别线索是音调。这些句子对可以采用以下两种形式之一:“陈述-问题对”,其中一个句子的一个版本是一个陈述,另一个是一个问题(例如“他想在海滩旁边买房子”作为陈述说出来vs. a question),以及“focus-shift pairs”,其中两个句子对比关注不同的词(例如“Go in FRONT of the bank, I said” vs. “Go in front of the BANK, I说”)。
To test the intonation perception of these individuals, Patel, Peretz, et al. (1998) used sentence pairs in which the members of the pair were lexically identical but had different intonation. Furthermore, computer editing of each sentence pair was used to ensure that the timing of syllables in the two sentences was identical and that intensity differences were minimized, so that the only salient cue for discrimination was pitch. These sentence pairs could take one of two forms: “statement-question pairs” in which one version of a sentence was a statement and the other a question (such as “He wants to buy a house next to the beach” spoken as a statement vs. a question), and “focus-shift pairs” in which the two sentences had contrastive focus on a different word (such as “Go in FRONT of the bank, I said” vs. “Go in front of the BANK, I said”).
除了这些语言刺激之外,还呈现了成对的非语言音调序列以供辨别。这些音调序列是根据句子对的语调轮廓创建的,用一个音调替换一个句子的每个音节,音高固定在两个音调之间的中间 Hz 值。该音节的最大和最小 F0 值(图 4.16)。因此,语言和非语言刺激的总长度和音调范围相匹配。
In addition to these linguistic stimuli, pairs of nonlinguistic tone sequences were also presented for discrimination. These tone sequences were created from the intonation contours of the sentence pairs, replacing each syllable of a sentence with a tone whose pitch was fixed at the Hz value midway between the maximum and minimum F0 values for that syllable (Figure 4.16). The linguistic and nonlinguistic stimuli were thus matched for overall length and pitch range.
声调起始发生在相应音节的元音起始时间,声调偏移量由每个音节内 F0 的偏移量决定。因此,每个音调序列与母句中的音节具有相同的时间节奏。所有音调都有一个复杂的频率结构,由一个基波和几个幅度递减的谐波组成。声音示例 4.9a–d 中给出了句子对及其旋律类似物的示例。(请注意,句子是法语的,因为该研究是针对讲法语的 amusics 进行的。稍后提供了类似的英语声音示例。)该研究背后的基本原理是,如果 amusics 的感知缺陷仅限于音乐,他们应该在区分句子方面表现良好,但在音调序列方面有困难,换句话说,应该观察到语音和非语言音调序列处理之间的分离。另一方面,如果语调和音调序列处理在大脑中重叠,那么应该在两种类型的序列上找到相似的表现。16
Tone onsets occurred at the vowel onset times of corresponding syllables, and tone offsets were determined by the offset of F0 within each syllable. Thus each tone sequence had the same temporal rhythm as the syllables in the parent sentence. All tones had a complex frequency structure consisting of a fundamental and a few harmonics of decreasing amplitude. An example of a sentence pair and its melodic analogs are given in Sound Examples 4.9a–d. (Note that the sentences are in French, as the study was conducted with French-speaking amusics. Similar sound examples in English are provided later.) The rationale behind the study was that if the amusics’ perceptual deficits were confined to music, they should perform well on discriminating the sentences but have difficulty with the tone sequences, in other words, a dissociation between speech and nonlinguistic tone sequence processing should be observed. On the other hand, if intonation and tone-sequence processing overlap in the brain, then similar performance on the two types of sequences should be found.16
图 4.16将句子的语调轮廓转换为离散声调模拟过程的图示。法语句子的波形显示在 (A) 中,其 F0 轮廓显示在 (B) 中。在 (C) 中,每个音节的 F0 已设置为固定值。(D) 显示音调模拟的波形。改编自 Patel、Peretz 等人,1998 年。
Figure 4.16 Illustration of the process of converting the intonation contour of a sentence into a discrete tone analog. The waveform of a French sentence is shown in (A) and its F0 contour in (B). In (C), the F0 of each syllable has been set to a fixed value. (D) shows the waveform of the tone analog. Adapted from Patel, Peretz, et al., 1998.
研究结果支持第二个结论。CN 在语言和非语言序列上都做得很好,而 IR 则难以区分这两种类型的序列。此外,IR 的问题不能归因于感知音高模式中简单的低水平感知问题,因为进一步的实验表明她可以准确地将单个陈述和问题标记为“陈述”或“问题”,并且可以识别给定的单词中的哪个词重点转移句进行了主要强调。因此,似乎 IR 的语调感知只是轻度受损,但她对旋律轮廓的记忆受到影响(参见 Belleville 等人,2003 年)。这种解释得到了 CN 和 IR 病变概况比较的支持。与 CN 相比,IR 对左初级听觉皮层和右额叶皮层有额外的损伤(图 4.17)。
The results of the study supported the second conclusion. CN did well on both the linguistic and nonlinguistic sequences, whereas IR had difficulty discriminating both types of sequences. Furthermore, IR’s problems could not be attributed to simple low level perceptual problems in perceiving pitch patterns, because further experiments showed that she could accurately label individual statements and questions as either “statement” or “question,” and could identify which word in a given focus-shift sentence carried the main emphasis. Thus it appeared that IR’s intonation perception was only mildly compromised, but her memory for melodic contours suffered (cf. Belleville et al., 2003). This interpretation was supported by a comparison of the lesion profiles of CN and IR. Compared to CN, IR had additional damage in left primary auditory cortex and right frontal cortex (Figure 4.17).
后一种损伤可能与基于记忆的缺陷有关,因为右额叶和颞叶皮层与语音和音乐中的音高记忆任务有关 (Zatorre 等人,1994)。这表明当旋律轮廓必须保留在工作记忆中时,语调和非语言音调序列都依赖于共同的大脑区域(参见 Semal 等人,1996)。17
The latter lesion is likely to be associated with a memory-based deficit, because right frontal and temporal cortex has been implicated in pitch memory tasks in both speech and music (Zatorre et al., 1994). This suggested that both intonation and nonlinguistic tone sequences rely on common brain regions when melodic contours must be retained in working memory (cf. Semal et al., 1996).17
对于那些有兴趣比较音乐和语音中的旋律感知的人来说,音乐音调耳聋(以下称为 mTD)现象尤其令人感兴趣。mTD 也称为“先天性失乐症”(Peretz & Hyde,2003 年),是指音乐感知和制作方面的严重问题,不能归因于听力损失、缺乏音乐接触或任何明显的非音乐社交或认知障碍。如4.1.1 节所述,音乐上的音盲个体(以下称为 mTDI)通常不会意识到音乐(包括他们自己的歌声)跑调了。例如,他们很难检测出新颖旋律中的走调音符,而大多数人(即使是没有受过音乐训练的人)发现这项任务很容易(Ayotte 等人,2002 年)。他们也很难辨别和识别没有歌词的旋律,即使是在他们的文化中很常见的旋律。
For those interested in comparing melody perception in music and speech, the phenomenon of musical tone-deafness (henceforth mTD) is of particular interest. Also referred to as “congenital amusia” (Peretz & Hyde, 2003), mTD refers to severe problems with music perception and production that cannot be attributed to hearing loss, lack of exposure to music, or any obvious nonmusical social or cognitive impairments. As mentioned in section 4.1.1, musically tone-deaf individuals (henceforth mTDIs) are typically unaware when music, including their own singing, is off-key. For example, they have difficulty detecting out-of-key notes in novel melodies, a task that most individuals (even those with no musical training) find quite easy (Ayotte et al., 2002). They also have difficulty discriminating and recognizing melodies without lyrics, even melodies that are quite common in their culture.
图 4.17 (a) amusic 个体 IR 的横向 CT 扫描。请注意,根据放射学惯例,大脑的右侧显示在每张图像的左侧。扫描以 10 毫米的增量优先进行,顺序为左上、右上、左下和右下。扫描显示双侧颞叶和右下额叶受损(详见 Patel, Peretz, et al., 1998)。
Figure 4.17 (a) Transverse CT scans of the amusic individual IR. Note that the right side of the brain is shown on the left of each image, per radiological convention. Scans proceed superiorly in 10-mm increments, in the order top left, top right, bottom left, and bottom right. The scans show bilateral temporal and right inferior frontal lobe damage (see Patel, Peretz, et al., 1998, for details).
将真正的 mTD 与人们有时贴在自己身上的“音聋”标签区分开来很重要。在调查研究中,大约 15% 的人将自己定义为音聋(Cuddy 等人,2005 年)。研究表明,许多这样的人实际上指的是他们糟糕的歌唱技巧,即使他们对音乐有敏锐的耳朵并且没有表现出音乐感知缺陷(Sloboda 等人,2005 年)。这些人缺乏歌唱能力可能仅仅反映了缺乏训练,这大概可以通过指导练习得到改善。区分 mTD(其根源似乎在于音高感知缺陷,如下文所述)与一种不太常见的疾病也很重要,在这种疾病中,音乐声音会失去其正常的音色质量,例如,被听到为“砰砰作响”锅碗瓢盆”(参见 Sacks,2007 年,
It is important to distinguish true mTD from the “tone deafness” label that people sometimes apply to themselves. In survey studies, about 15% of people self-define as tone deaf (Cuddy et al., 2005). Research reveals that many such people are in fact referring to their poor singing skills, even if they have a keen ear for music and show no music perception deficits (Sloboda et al., 2005). Lack of singing ability in such individuals may simply reflect a lack of training, which could presumably be ameliorated by guided practice. It is also important to distinguish mTD (which appears to have its roots in pitch perception deficits, as discussed below) from a less commonly reported disorder in which musical sounds lose their normal timbral qualities, being heard, for example, as “the banging of pots and pans” (cf. Sacks, 2007, for descriptions of such cases).
关于 mTD 的一个重要事实是它不是由于缺乏音乐接触,因为许多 mTDIs 报告说他们在童年时上过音乐课和/或来自音乐家庭 (Peretz & Hyde, 2003)。此外,对双胞胎的研究表明,有一些特定的基因会使一个人处于这种情况的风险中(Drayna 等人,2001 年)。因此,大多数 mTDI 报告自记事起就患有这种疾病是有道理的。事实上,一个经常听到的故事是,mTDI 第一次发现他或她的问题是在童年时期,当时学校的合唱团老师要求他们停止唱歌,只是将嘴唇移到音乐上。在那之前,他或她没有意识到任何音乐问题。
An important fact about mTD is that it is not due to lack of exposure to music, as many mTDIs report having music lessons in childhood and/or coming from musical households (Peretz & Hyde, 2003). Furthermore, research on twins suggests that there are specific genes that put one at risk for this condition (Drayna et al., 2001). Thus it makes sense that most mTDIs report having their condition for as long as they can remember. Indeed, one oft-heard story is that an mTDI first discovered his or her problem in childhood, when a choir teacher in school asked them to stop singing and simply move their lips to the music. Until that time, he or she was unaware of any musical problem.
尽管人们注意到 mTD 的存在已超过一百年 (Allen, 1878),但对这一现象的系统研究是一项相对较新的尝试(例如,Peretz 等人,2002 年;Ayotte 等人,2002 年;Peretz & Hyde, 2003 年;福克斯顿等人,2004 年)。这一领域的研究有望迅速发展,因为据估计 mTDI 约占人口的 4%(Kalmus 和 Fry,1980),因此比获得的音乐更容易定位和使用。
Although the existence of mTD has been noted for over a hundred years (Allen, 1878), systematic research on this phenomenon is a relatively recent endeavor (e.g., Peretz et al., 2002; Ayotte et al., 2002; Peretz & Hyde, 2003; Foxton et al., 2004). Research in this area is poised to grow rapidly, because mTDIs are estimated to comprise about 4% of the population (Kalmus & Fry, 1980) and are thus easier to locate and work with than acquired amusics.
就目前的目的而言,mTD 的意义在于它表现为对音乐的高度选择性缺陷,对其他认知能力没有明显的影响(尽管参见 Douglas 和 Bilkey,2007)。事实上,mTDIs 可能在其他领域表现出色:他们的队伍中包括许多名人,包括获得诺贝尔奖的经济学家米尔顿弗里德曼。这使得 mTD 对于语音和音乐感知的比较研究特别有吸引力。回到手头的话题,关于 mTD 中的音乐与口语旋律轮廓感知已知多少?
For the current purposes, the significance of mTD is that it manifests as a highly selective deficit for music, with no readily apparent consequences for other cognitive abilities (though see Douglas and Bilkey, 2007). Indeed, mTDIs may excel in other domains: Their ranks include numerous famous individuals, including the Nobel-prize–winning economist Milton Friedman. This makes mTD particularly attractive for the comparative study of speech and music perception. Returning to the topic at hand, what is known about musical versus spoken melodic contour perception in mTD?
在一项相关研究中,Ayotte 等人。(2002) 测试了 mTDI 区分仅语调不同的句子的能力,以及区分由句子的语调模式创建的非语言语调序列的能力。(音调序列是使用 Patel, Peretz, et al., 1998 的方法创建的,如上一节所述。)与 Patel, Peretz, et al. 的早期发现相反,Ayotte 专注于获得性失乐等。发现句子的表现与其非语言类似物之间存在戏剧性的分离。具体来说,mTDIs 在区分语调不同的句子时没有困难,但在区分相应的语调序列时却有很大的困难。相比之下,控件在两个域中的表现同样出色。
In one relevant study, Ayotte et al. (2002) tested mTDIs for their ability to discriminate between sentences that differed only in intonation, and to discriminate between nonlinguistic tone sequences created from the intonation patterns of the sentences. (The tone sequences were created using the methods of Patel, Peretz, et al., 1998, described in the previous section.) In contrast to the earlier findings of Patel, Peretz, et al., which had focused on acquired amusia, Ayotte et al. found a dramatic dissociation between performance on the sentences and their nonlinguistic analogs. Specifically, mTDIs had no difficulty discriminating between sentences differing in intonation, but had substantial difficulty discriminating between the corresponding tone sequences. Controls, in contrast, performed equally well in both domains. This suggests a dissociation between melodic contour processing in speech and music, with normal speech intonation processing but impaired musical melodic contour processing.
Ayotte 等人报道的解离如何。(2002) 得到解释吗?对 mTDI 的心理物理学研究表明,它们在检测细粒度音高变化方面存在缺陷。例如,Hyde 和 Peretz (2004) 让 mTDI 聆听 5 个高音钢琴音调序列(C6,其基频为 1046 Hz),其中第四个音调可能与其他音调不同。听众必须指出序列是否包含音调变化。为了达到 75% 正确的标准,mTDI 需要音调变化为1/2 半音。相比之下,控件对于使用的最小音高变化几乎 100% 正确:1/4 半音。此外,在音高变化为 2 个半音之前,mTDI 的表现不如对照。基于这些发现,Peretz 和 Hyde (2003) 建议 Ayotte 等人使用的句子中的音高对比。(2002) 比音调序列更粗糙,因此足够大以克服 mTDI 在音调变化检测中的缺陷。这是一个合理的建议,因为 Patel、Peretz 等人使用的将 F0 轮廓转换为音调的过程。(1998) 涉及将每个音节内动态变化的音高转换为接近音节平均 F0 的单个音高,从而压缩句子中音高变化的总量。Peretz 和 Hyde 的建议也导致了一个可检验的预测:即 如果非语言音调序列完全遵循原始 F0 轮廓,那么音盲个人应该能够毫无困难地辨别它们。Patel、Foxton 和 Griffiths (2005) 使用来自英国的 mTDI 群体测试了这个想法。刺激基于英语版的韵律音乐电池,并集中在语言中的焦点转移对及其非语言音调类似物上。
How can the dissociation reported by Ayotte et al. (2002) be explained? Psychophysical research with mTDIs has revealed that they have deficits in detecting fine-grained pitch changes. For example Hyde and Peretz (2004) had mTDIs listen to sequences of 5 high-pitched piano tones (C6, which has a fundamental frequency of 1046 Hz), the fourth of which could differ in pitch from the others. The listeners had to indicate if the sequence contained a pitch change or not. To reach a criterion of 75% correct, mTDIs needed a pitch change of 1/2 semitone. Controls, in contrast, were nearly 100% correct for the smallest pitch change used: 1/4 semitone. Furthermore, mTDIs did not perform as well as controls until the pitch change was 2 semitones. Based on such findings, Peretz and Hyde (2003) suggested that pitch contrasts in the sentences used by Ayotte et al. (2002) were coarser than in the tone sequences, and were thus large enough to overcome the mTDIs’ deficits in pitch change detection. This was a reasonable suggestion, as the process of converting F0 contours to tones used by Patel, Peretz, et al. (1998) involved converting the dynamically changing pitch within each syllable to a single pitch near the mean F0 of the syllable, thus compressing the overall amount of pitch variation in a sentence. Peretz and Hyde’s suggestion also led to a testable prediction: namely, if the nonlinguistic tone sequences followed the original F0 contours exactly, then tone-deaf individuals should be able to discriminate them without difficulty. Patel, Foxton, and Griffiths (2005) tested this idea, using a population of mTDIs from the United Kingdom. The stimuli were based on an English version of the prosody-music battery, and concentrated on focus-shift pairs in language and their nonlinguistic tone analogs.
Patel、Foxton 和 Griffiths (2005) 从焦点转移对中制作了两种类型的非语言类比。第一种类型与之前使用的相同,换句话说,每个音节都被固定音调的音调取代,该音调设置为该音节内最大和最小 F0 值之间的 Hz 值。这些被称为“离散间距”类似物。在另一种(新的)模拟音中,每个音调的音高都完全遵循音节内的 F0 轮廓,就像 F0 一样向上和/或向下滑动:这些被称为“滑动音高”模拟音。在这两种类型的序列中,声调起始发生在相应音节的元音起始时间,声调偏移由每个音节内 F0 的偏移确定。因此,每个音调序列与母句中的音节具有相同的时间节奏。所有音调都有一个基音和几个泛音,使类似物具有单簧管般的品质(声音示例 4.10a-f)。根据 Ayotte 等人的程序,三种类型的刺激(语音、离散音高模拟和滑行音高模拟)在不同的块中呈现,性能评分基于命中百分比减去误报百分比。(2002)。18 图 4.18显示了这三个任务的结果。
Patel, Foxton, and Griffiths (2005) made two types of nonlinguistic analogs from the focus-shift pairs. The first type was the same as used previously, in other words, each syllable was replaced by a tone of fixed pitch set to the Hz value midway between the maximum and minimum F0 values within that syllable. These were referred to as the “discrete pitch” analogs. In the other (new) type of analog, each tone’s pitch exactly followed the F0 contour within the syllable, gliding up and/or down just as the F0 did: These were referred to as the “gliding pitch” analogs. In both types of sequences, tone onsets occurred at the vowel onset times of corresponding syllables, and tone offsets were determined by the offset of F0 within each syllable. Thus each tone sequence had the same temporal rhythm as the syllables in the parent sentence. All tones had a fundamental and several harmonics, giving the analogs a clarinet-like quality (Sound Examples 4.10a–f). The three types of stimuli (speech, discrete pitch analogs, and gliding pitch analogs) were presented in separate blocks, and scoring of performance was based on percentage of hits minus percentage of false alarms, in accordance with the procedure of Ayotte et al. (2002).18 Figure 4.18 shows results on the three tasks.
与 Ayotte 等人的研究结果一致。(2002),音盲个体在基于语调区分句子方面明显优于他们在区分语调轮廓的离散音高类似物方面更好(参见 Ayotte 等人,2002 年,图 3)。然而,令人惊讶的是,音盲人士也难以辨别完全模仿句子语调模式的滑音类似物。事实上,离散音高模拟和滑动音高模拟的表现是无法区分的,并且滑动音高模拟的表现明显比句子差。
Consistent with the findings of Ayotte et al. (2002), tone-deaf individuals were significantly better at discriminating sentences based on intonation than they were at discriminating discrete-pitch analogs of the intonation contours (cf. Ayotte et al., 2002, Figure 3). Surprisingly, however, tone-deaf individuals also had difficulty discriminating gliding-pitch analogs that exactly mimicked the intonation patterns of the sentences. In fact, performance on the discrete and gliding pitch analogs was indistinguishable, and performance on the gliding pitch analogs was significantly worse than on the sentences.
图 4.18音盲个体辨别三种音高模式的表现:语音中的语调轮廓、基于离散音高的语调轮廓的非语言模拟,以及基于精确复制音高模式的滑动音调运动的语调轮廓的非语言模拟的语调。垂直轴显示命中百分比减去误报百分比。错误条显示 1 个标准错误。来自 Patel、Foxton 和 Griffiths,2005 年。
Figure 4.18 Performance of musically tone-deaf individuals on discrimination of three types of pitch patterns: intonation contours in speech, nonlinguistic analogs of intonation contours based on discrete pitches, and nonlinguistic analogs of intonation contours based on gliding pitch movements that exactly replicate the pitch pattern of intonation. The vertical axis shows percentage of hits minus percentage of false alarms. Error bars show 1 standard error. From Patel, Foxton, & Griffiths, 2005.
鉴于 Foxton 等人的某些心理物理学发现,这些发现尤其引人注目。(2004) 与这些相同的 mTDI。福克斯顿等人。检查了用于检测两个连续出现的纯音之间音高变化的阈值。在一种情况下(“分段音高变化检测”),音调被一个短的无声间隔分开。在另一种情况下(“滑动音高变化检测”),该间隔由一个线性频率斜坡填充,该频率斜坡弥合了音调之间的音高差异。对于 Patel、Foxton 和 Griffiths(2005 年)研究中的 mTDI,当音高由中间滑音连接时,音高变化检测的阈值明显小于它们被无声间隔分开时(阈值 = 0.21 st滑动音高变化,而分段音高变化为 0.56 st,
These findings are particularly striking in light of certain psychophysical findings by Foxton et al. (2004) with these same mTDIs. Foxton et al. examined thresholds for detecting a pitch change between two successively presented pure tones. In one condition (“segmented pitch-change detection”), the tones were separated by a short silent interval. In another condition (“gliding pitch-change detection”), this interval was filled by a linear frequency ramp that bridged the pitch difference between the tones. For the mTDIs in the study of Patel, Foxton, and Griffiths (2005), the threshold for pitch-change detection was significantly smaller when pitches were connected by an intervening glide than when they were separated by a silent interval (threshold = 0.21 st for gliding pitch change, versus 0.56 st for segmented pitch change, based on 75% correct discrimination).
因此,mTDI 不仅难以区分语调的滑音音调模拟,而且尽管事实上它们检测滑音音高模式中的音高变化的阈值比分段音高模式中的阈值小得多,但它们仍存在这些困难。因此,似乎 mTDI 的相对正常的语调感知不能用语调的想法来解释使用超过其心理物理阈值的粗略音高对比来检测音高变化。
Thus not only do mTDIs have difficulty discriminating gliding-pitch analogs of intonation, they have these difficulties despite the fact that they have substantially smaller thresholds for detecting pitch changes in gliding-pitch patterns than in segmented pitch patterns. It therefore appears that the relatively normal intonation perception of mTDIs cannot be explained by the idea that intonation uses coarse pitch contrasts that exceed their psychophysical thresholds for pitch-change detection.
显然,需要一个假设来解释 mTDI 在区分口语语调轮廓方面的正常表现与它们在区分从语音上下文中提取的相同轮廓方面的受损表现。在下一节中,我提出了一个这样的假设,基于语音和音乐中的旋律轮廓感知实际上依赖于共同的神经回路的想法。
Clearly, a hypothesis is needed that can account for the normal performance of mTDIs in discriminating spoken intonation contours versus their impaired performance in discriminating the same contours extracted from a phonetic context. In the following section, I propose one such hypothesis, based on the idea that melodic contour perception in speech and music does in fact rely on common neural circuitry.
从表面上看,Patel、Foxton 和 Griffiths (2005) 的发现似乎是轮廓处理模块化的良好证据 (Peretz & Coltheart, 2003),因为语音中的音高轮廓比从语音中提取的相同轮廓更容易辨别。然而,还有另一种可能性,正如本节所探讨的那样。我称之为“旋律轮廓耳聋假说”。这个假设提出 mTDIs 在判断方向上有等效的问题语音和音乐中的音调变化,但语调感知在很大程度上对这个问题很稳健,而音乐感知则不然。在讨论该假设如何解释 Patel、Foxton 和 Griffiths(2005 年)研究中观察到的分离之前,值得深入研究推动该假设发展的发现。
Prima facie, the finding of Patel, Foxton, and Griffiths (2005) seems to be good evidence for the modularity of contour processing (Peretz & Coltheart, 2003) because pitch contours in speech are discriminated better than the same contours extracted from speech. There is another possibility, however, as explored in this section. I call this the “melodic contour deafness hypothesis.” This hypothesis proposes that mTDIs have equivalent problems in judging the direction of pitch change in speech and music, but that intonation perception is largely robust to this problem, whereas music perception is not. Before discussing how this hypothesis can account for the dissociations observed in the study of Patel, Foxton, and Griffiths (2005), it is worth delving into the findings that motivated the development of this hypothesis.
其中一项发现是发现某些 mTDI 实际上确实存在语言语调感知问题。具体来说,Lochy 等人。(2004; cf. Patel, Wong, et al., in press) 使用 Patel, Peretz, et al., (1998) 的句子刺激测试了一组 mTDIs,发现一些人在相同/不同的辨别上有缺陷语言陈述-问题对。也就是说,当被要求判断两个句子在声学上是否相同时,当一个句子是问题而另一个是陈述时,他们会遇到困难,而控制参与者发现这项任务很容易。相比之下,当这两个句子是焦点转移对的成员时,他们做得很好,其中重点(由音高移动表示)在每个句子中的不同单词上(参见后面的两个小节,以获取陈述-问题的示例和焦点转移对)。同一个词的音高运动方向(向上与向下),而焦点转移任务只需要检测句子中的显着音高运动,因为不同的词在焦点转移对的两个成员中承担大的运动。也就是说,对音高运动方向的敏感性与焦点转移任务无关:只要能检测到音高变化,并且能记住这种变化发生在两个句子中相同或不同的词上,就可以解决任务。
One such finding was the discovery that some mTDIs do in fact have problems with intonation perception in language. Specifically, Lochy et al. (2004; cf. Patel, Wong, et al., in press) tested a group of mTDIs using the sentence stimuli of Patel, Peretz, et al., (1998), and found that some had deficits in same/different discrimination of linguistic statement-question pairs. That is, when asked to tell whether two sentences were acoustically identical or not, they had difficulty when one sentence was a question and the other was a statement, a task that control participants found easy. In contrast, they did well when the two sentences were members of a focus-shift pair, in which emphasis (as signaled by pitch movement) was on a different word in each sentence (cf. two subsections back for examples of a statement-question and a focus-shift pair). The critical difference between these tasks is that the statement-question task requires discriminating the direction of pitch movement on the same word (up versus down), whereas the focus-shift task simply requires detecting a salient pitch movement within a sentence, because different words bear the large movement in the two members of a focus-shift pair. That is, sensitivity to the direction of pitch movement is irrelevant to the focus-shift task: As long as one can detect a pitch change, and can remember that this change happened on the same or different words in the two sentences, one can solve the task.
另一个激发旋律轮廓耳聋假说的发现是发现 mTDIs 在非语言任务中有明显的缺陷,这些任务需要俯仰方向的感知。图 4.19显示了三种任务的 75% 正确性能的阈值:分段俯仰变化检测、滑翔俯仰变化检测和俯仰方向判断(Foxton 等人,2004)。19
Another finding that motivated the melodic contour deafness hypothesis was the discovery that mTDIs have marked deficits in nonlinguistic tasks that require the perception of pitch direction. Figure 4.19 shows thresholds for 75% correct performance on three types of tasks: segmented pitch-change detection, gliding pitch-change detection, and pitch-direction judgments (Foxton et al., 2004).19
前两个任务是简单的音高变化检测任务,在上一节中进行了描述。第三项任务涉及判断音高方向:听众听到两对纯音并决定哪一对音高上升。(音高总是在一对中下降,在另一对中上升。在两对中,音调通过滑音连接。)请注意 mTDI 和控件之间在后一项任务上的显着差异:它们的阈值比控件高约 20 倍,与在音高变化检测方面比控制差 2 或 3 倍的阈值相比。
The first two tasks are simple pitch-change detection tasks, and were described in the previous section. The third task involves judging pitch direction: Listeners hear two pairs of pure tones and decide in which pair the pitch goes up. (Pitch always goes down in one pair and up in the other. In both pairs, tones are connected by a glide.) Note the dramatic difference between mTDIs and controls on this latter task: Their thresholds are about 20 times higher than those of controls, compared to thresholds that are 2 or 3 times worse than controls in terms of pitch-change detection.
Lochy 等人的数据。(2004) 和 Foxton 等人。(2004) 表明 mTDI 在语音和非语言声音中都存在音高方向感知问题。如果旋律轮廓耳聋假说是正确的,那么音高方向的常见处理缺陷是这些缺陷的基础,但语音语调感知在很大程度上对这种缺陷是稳健的,而音乐感知则不然。
The data of Lochy et al. (2004) and Foxton et al. (2004) suggest that mTDIs have problems with pitch-direction perception in both speech and nonlinguistic sounds. If the melodic contour deafness hypothesis is correct, then a common processing deficit for pitch direction underlies these deficits, but speech intonation perception is largely robust to this deficit, whereas music perception is not.
为什么语调感知会对赤字产生强烈影响?一个原因是在像英语这样的语调语言中(音高不区分词汇,就像在声调语言中一样),音高变化的方向对于理解很少是至关重要的。例如,如果使用音调移动来表示对单词的关注,那么移动是向上还是向下对听众来说可能无关紧要,只要它是显着且可检测的即可。虽然俯仰运动的方向是对于区分陈述和问题很重要,通常有冗余的句法、语义或上下文提示来指示话语是否是问题。因此,语音中的音高方向缺陷可能在很大程度上是无症状的,仅在音高方向对任务至关重要且冗余信息源被抑制的受控情况下才会显现出来。语调感知可能对音高方向缺陷具有鲁棒性的第二个原因是这种缺陷不是全有或全无,而是程度问题。回想一下图 4.19,方向赤字是由升高的阈值定义的用于准确检测俯仰方向。在 Foxton 等人的研究中。(2004),当俯仰运动明显高于其阈值时,mTDI 成功完成了俯仰方向任务。
Why would intonation perception be robust to the deficit? One reason is that in intonation languages such as English (in which pitch does not distinguish lexical items, as it does in tone languages), the direction of a pitch change is seldom crucial to understanding. For example, if a pitch movement is used to signal focus on a word, it may matter little to a listener if the movement is upward or downward, as long as it is salient and detectable. Although the direction of pitch movement is important for distinguishing statements from questions, there are often redundant syntactic, semantic, or contextual cues to indicate whether an utterance is a question or not. Hence a pitch direction deficit in speech may be largely asymptomatic, revealing itself only in controlled situations in which pitch direction is crucial to the task and redundant sources of information are suppressed. A second reason that intonation perception may be robust to a pitch-direction deficit is that such a deficit is not all-or-none, but a matter of degree. Recall from Figure 4.19 that the direction deficit is defined by an elevated threshold for accurate detection of pitch direction. In the study of Foxton et al. (2004), mTDIs succeeded on the pitch-direction task when pitch movements were significantly above their threshold.
因此,关键问题是语音中与语言相关的音高运动的大小与 mTDI 的音高方向阈值相比如何。参考图 4.19可以看出, mTDI的平均音高方向阈值略高于 2 个半音。这与英语中上升或下降音高重音的大小(例如,自动音段-韵律理论中的 L+H* 移动)相比如何?现有的语调研究表明,2 个半音接近语音中上升或下降音高重音频谱的低端(例如,Xu & Xu, 2005; Arvaniti & Garding, in press)。换句话说,大多数与语言相关的音高运动可能超过 mTDI 的音高方向阈值。20然而,可能会出现这样的情况,即具有大阈值的个人听到低于其阈值的语言相关音高运动,并且无法确定其方向。这种情况可以解释 Lochy 等人发现的案例。(2004),谁不能从陈述中区分问题。
Hence the key question is how the size of linguistically relevant pitch movements in speech compares to the pitch direction thresholds of mTDIs. Consulting Figure 4.19, it can be seen that the average pitch-direction threshold for mTDIs is a little above 2 semitones. How does this compare to the size of rising or falling pitch accents in English (e.g., L+H* movements in autosegmental-metrical theory)? Existing research on intonation suggests that 2 semitones is near the low end of the spectrum for rising or falling pitch accents in speech (e.g., Xu & Xu, 2005; Arvaniti & Garding, in press). In other words, most linguistically relevant pitch movements are likely to be in excess of the pitch-direction thresholds of mTDIs.20 However, circumstances may arise in which an individual with a large threshold hears a linguistically relevant pitch movement below their threshold, and cannot determine its direction. Such a circumstance may explain the cases found by Lochy et al. (2004), who could not discriminate questions from statements.
图 4.19音盲个体在三种音高感知任务中的表现:分段音高变化检测、滑行音高变化检测和音高方向检测。错误条显示 1 个标准错误。Foxton 等人的研究数据。(2004)。
Figure 4.19 Performance of tone-deaf individuals on three kinds of pitch perception tasks: segmented pitch-change detection, gliding pitch-change detection, and pitch-direction detection. Error bars show 1 standard error. Data from the study by Foxton et al. (2004).
转向音乐,为什么音乐感知会受到音高方向缺陷的严重影响?如本章前面所述(第 4.5.1 节),音乐旋律以 2 个半音或更小的小音高间隔为主。请注意,这低于 mTDI 的平均俯仰方向阈值。这表明 mTDI 通常无法判断音乐旋律的音高是升高还是降低。这种对音乐旋律轮廓的感知退化使得很难衡量旋律结构的关键方面,例如动机相似性和对比。此外,如果没有准确的心理表征如果没有旋律轮廓,就不会有学习音程的发展框架(Dowling,1978)。因此,mTDI 永远不会获得音乐感知的正常音调图式,这或许可以解释为什么它们无法检测到音乐中走调(“酸味”)的音符。
Turning to music, why would music perception be so severely affected by a pitch direction deficit? As noted earlier in this chapter (section 4.5.1), musical melodies are dominated by small pitch intervals of 2 semitones or less in size. Note that this is below the average pitch-direction threshold of mTDIs. This suggests that mTDIs often cannot tell if a musical melody is going up or down in pitch. This degraded perception of musical melodic contour would make it very difficult to gauge crucial aspects of melodic structure such as motivic similarity and contrast. Furthermore, without an accurate mental representation of melodic contour, there would be no developmental framework for learning musical intervals (Dowling, 1978). Hence mTDIs would never acquire the normal tonal schemata for music perception, which might explain why they fail to detect off-key (“sour”) notes in music.
旋律轮廓耳聋假设如何解释 Patel、Foxton 和 Griffiths(2005 年)研究中观察到的分离?考虑语音语调任务中使用的焦点转移问题。在这样的句子中,显着的音调移动在不同的词上(如本节前面所述)。因此,可以通过“语义重新编码策略”解决此类句子的相同/不同任务,换句话说,通过简单地听取每个句子中具有显着音高运动的单词,然后确定这些单词是否相同或不同的。然而,如果语调轮廓与它们的词汇上下文分离,就像在离散音高和滑音音高模拟语调中一样,那么这种策略就不再可能了,成功取决于记住音高随时间变化的模式。
How can the melodic contour deafness hypothesis account for the dissociations observed in the study of Patel, Foxton, and Griffiths (2005)? Consider the focus-shift questions used in the speech intonation task. In such sentences, the salient pitch movement is on a different word (as noted earlier in this section). Thus, a same/different task with such sentences can be solved with a “semantic recoding strategy,” in other words, by simply listening for a word with a salient pitch movement in each sentence, and then deciding if these words are the same or different. If the intonation contours are separated from their lexical context, however, as in the discrete-pitch and gliding-pitch analogs of intonation, then this strategy is no longer possible and success depends on remembering the patterns of ups and downs of pitch over time. Hence a problem in perceiving pitch direction could disrupt this task.
关于旋律轮廓耳聋假说的一个重要问题涉及其提出的神经基础。在这方面,有趣的是检查 mTDI 的俯仰方向问题与简单的俯仰变化检测问题之间的关系。图 4.20显示了 12 mTDI 的俯仰变化检测与俯仰方向检测的阈值。可以看出,音高变化检测阈值不预测音高方向阈值。
An important question about the melodic contour deafness hypothesis concerns its proposed neural foundation. In this regard, it is interesting to examine the relationship between mTDIs problems with pitch direction and their problems with simple pitch-change detection. Figure 4.20 shows thresholds for pitch-change detection versus pitch-direction detection for 12 mTDIs. As can be seen, the pitch-change detection thresholds do not predict pitch-direction thresholds.
这表明存在独立的俯仰方向缺陷。有理由相信这种缺陷可能是由右听觉皮层的异常引起的。对颞叶区域手术切除患者的研究表明,右侧次级听觉皮层(外侧赫施尔回)切除的患者在判断音高方向方面存在明显缺陷,即使他们检测简单音高变化的阈值是正常的。相比之下,左侧听觉皮层进行了类似切除的患者则没有这种方向缺陷(Johnsrude 等人,2000 年)。支持音高方向检测和旋律轮廓感知之间存在联系的证据是,两者都被右侧听觉皮层的损伤所破坏(Johnsrude 等人,2000 年;Liégeois-Chauvel 等人,1998 年)。
This suggests an independent pitch-direction deficit. There are reasons to believe that such a deficit could arise from abnormalities in right auditory cortex. Research on patients with surgical excisions of temporal lobe regions has revealed that individuals with excisions of right secondary auditory cortex (lateral Heschl’s gyrus) have pronounced deficits in judging pitch direction, even though their thresholds for simple pitch-change detection are normal. In contrast, patients with comparable excisions of left auditory cortex show no such direction deficits (Johnsrude et al., 2000). Evidence supporting a link between pitch-direction detection and melodic contour perception is the fact that both are disrupted by lesions to right auditory cortex (Johnsrude et al., 2000; Liégeois-Chauvel et al., 1998).
对动物的神经生理学研究也支持右半球编码音调方向的基础(Wetzel 等人,1998 年),并表明方向敏感性可能来自皮质神经元的侧抑制模式(Shamma 等人,1993 年;Ohl 等人)等人,2000 年;参见 Rauschecker 等人,1998a,1998b)。实际上,不对称横向抑制已成功用于在听觉皮层的计算模型中赋予细胞音高方向敏感性(例如,Husain 等人,2004 年)。在这些模型中,方向敏感性来自音调专题图中相邻细胞之间抑制性连接的特定解剖模式。
Neurophysiological research on animals also supports a right-hemisphere basis for coding of pitch direction (Wetzel et al., 1998), and suggests that direction sensitivity may arise from patterns of lateral inhibition among cortical neurons (Shamma et al., 1993; Ohl et al., 2000; cf. Rauschecker et al., 1998a, 1998b). Indeed, asymmetric lateral inhibition has been successfully used to give cells pitch-direction sensitivity in computational models of auditory cortex (e.g., Husain et al., 2004). In these models, direction sensitivity emerges from specific anatomical patterns of inhibitory connections between neighboring cells in tonotopic maps.
图 4.20 12 名音盲个体在分段音高变化检测任务和音高方向检测任务上的表现之间的关系。显示了最佳拟合回归线。音高变化检测缺陷的严重程度无法预测音高方向缺陷的严重程度。回归线方程:方向阈值 = 1.16 * 变化阈值 + 1.57(r 2 = 0.02,p = .65)。
Figure 4.20 The relationship between performance on the segmented pitch-change detection task and the pitch-direction detection task for 12 musically tone-deaf individuals. The best fitting regression line is shown. The severity of the pitch-direction deficit is not predicted by the severity of the pitch-change detection deficit. Equation for regression line: Direction threshold = 1.16 * change threshold + 1.57 (r2 = 0.02, p = .65).
因此,mTDI 中音调方向缺陷的病因学的一个似是而非的假设是右侧次级听觉皮层音调图的抑制连接异常接线。由于遗传因素,这种错误连接可能发生在发育早期(Drayna 等人,2001 年)。使用诸如 MRI 之类的宏观成像方法可能无法检测到这种错误接线,并且可能需要细胞水平的研究(例如,死后组织学)或电生理学技术(参见 Peretz 等人,2005 年)。有趣的是,体内结构脑成像显示 mTDIs 在右侧大脑半球存在神经异常(Hyde 等人,2006 年)。然而,异常发生在右下额叶皮质区域,该区域被认为与音调模式的短期记忆有关。异常包括白质组织减少(即,连接神经细胞而不是神经细胞体本身的纤维)。目前,尚不清楚这种异常是天生的,还是由于右额叶区域和听觉皮层之间的异常连接而在发育过程中出现的,特别是因为额叶皮质中的灰质和白质模式在青春期发育良好(Toga 等人)等人,2006 年)。例如,感知旋律轮廓(出现在听觉皮层)的问题可能导致正常音乐记忆技能的发育不全,因为它 特别是因为额叶皮质中的灰质和白质模式在青春期发育良好(Toga 等人,2006 年)。例如,感知旋律轮廓(出现在听觉皮层)的问题可能导致正常音乐记忆技能的发育不全,因为它 特别是因为额叶皮质中的灰质和白质模式在青春期发育良好(Toga 等人,2006 年)。例如,感知旋律轮廓(出现在听觉皮层)的问题可能导致正常音乐记忆技能的发育不全,因为它会破坏对跨时间动机相似性的识别,这是旋律感知的一个非常基本的方面,它依赖于音高模式的短期记忆(参见第 4.2.5 节)。音高记忆缺陷可以解释为什么 mTDI 在区分两个短旋律的旋律轮廓方面存在缺陷,即使音高之间的步长很大,因此可能超过了它们的音高变化检测和音高方向检测阈值(Foxton 等人, 2004)。
Thus one plausible hypothesis for the etiology of the pitch-direction deficits in mTDIs is abnormal wiring of inhibitory connections in tonotopic maps of right secondary auditory cortex. This miswiring may occur early in development due to genetic factors (Drayna et al., 2001). Such miswiring may not be detectable using macro-imaging methods such as MRI, and may require cellular-level investigations (e.g., postmortem histology) or electrophysiological techniques (cf. Peretz et al., 2005). Interestingly, in-vivo structural brain imaging has revealed that mTDIs have a neural anomaly in the right cerebral hemisphere (Hyde et al., 2006). However, the anomaly is in a region of right inferior frontal cortex, which is thought to be involved in short-term memory for pitch patterns. The anomaly consists of a decrease in white matter tissue (i.e., the fibers that connect nerve cells rather than nerve cell bodies themselves). At the moment, it is not known whether the anomaly is inborn or if it arises in development due to anomalous connections between right frontal regions and auditory cortex, particularly because both gray and white matter patterns in the frontal cortices develop well into adolescence (Toga et al., 2006). For example, a problem in perceiving melodic contour (arising in auditory cortex) may lead to the underdevelopment of normal musical memory skills, because it would disrupt recognition of motivic similarity across time, a very basic aspect of melody perception that depends on short-term memory for pitch patterns (cf. section 4.2.5). A pitch-memory deficit might explain why mTDIs have deficits in discriminating the melodic contour of two short melodies even when the steps between pitches are large, and hence presumably above their pitch-change detection and pitch-direction detection thresholds (Foxton et al., 2004).
退后一步,旋律轮廓耳聋假说的意义在于,它表明日常生活中音乐和语言感知之间的行为分离可能掩盖了口语和音乐旋律处理中的神经共性。换句话说,行为分离不一定与神经分离相同:非特定领域的缺陷可能会引起特定领域的问题,因为每个领域对相关能力的要求不同。
Taking a step back, the significance of the melodic contour deafness hypothesis is that it suggests that a behavioral dissociation between music and speech perception in everyday life may disguise a neural commonality in the processing of spoken and musical melodies. In other words, a behavioral dissociation is not necessarily the same as a neural dissociation: A nondomain-specific deficit can give rise to domain-specific problems because of the different demands that each domain places on the ability in question.
语言学家 Dwight Bolinger 曾评论说:“既然语调是语音旋律的同义词,而旋律又是从音乐中借用的一个术语,那么很自然地想知道音乐和语调之间可能存在什么联系”(Bolinger,1985:28)。尽管数百年来这种联系一直吸引着学者们的兴趣,但对音乐和口语旋律的实证比较却是最近的尝试。本章表明,尽管两个领域的旋律系统之间存在重要差异(例如音高音程类别的使用、规则节拍和音乐旋律中的调性中心),但音乐旋律和语言旋律之间存在许多联系点在结构和加工方面。例如,作曲家母语中音高模式的统计数据可以反映在他或她的器乐中。此外,神经心理学研究表明,语音和音乐中的旋律轮廓可能在大脑中以重叠方式处理。这些和其他发现表明,音乐和口头旋律实际上比人们普遍认为的更密切相关,并需要进一步研究,以加深我们对这两种旋律之间关系的理解。
The linguist Dwight Bolinger once commented that “since intonation is synonymous with speech melody, and melody is a term borrowed from music, it is natural to wonder what connection there may be between music and intonation” (Bolinger, 1985:28). Although this connection has interested scholars for hundreds of years, empirical comparisons of musical and spoken melody are a recent endeavor. This chapter has shown that despite important differences between the melodic systems of the two domains (such as the use of pitch interval categories, a regular beat, and a tonal center in musical melodies), there are numerous points of contact between musical and linguistic melody in terms of structure and processing. For example, the statistics of pitch patterning in a composer’s native language can be reflected in his or her instrumental music. Furthermore, neuropsychological research indicates that melodic contours in speech and music may be processed in an overlapping way in the brain. These and other findings suggest that musical and spoken melody are in fact more closely related than has generally been believed, and invite further research aimed at refining our understanding of the relations between these two types of melody.
这是第 4.1.2 节的附录。
This is an appendix for section 4.1.2.
要将以 Hz 为单位的值转换为程序图y轴上的半音标度 (st):
To convert a value in Hz to the semitone scale (st) on the y-axis of the prosogram:
st = 12 * log 2 ( X ),其中X是以赫兹为单位的值。
st = 12 * log2(X), where X is the value in Hz.
要将沿等程图 y 轴的 st 值转换为 Hz:
To convert a st value along the y-axis of a prosogram to Hz:
Hz = 2 (X/12),其中X是 st 中的值。
Hz = 2(X/12), where X is the value in st.
1已经注意到,对于给定的说话者,陈述性话语末尾的低音通常相当稳定,这可能反映了生理因素(Lieberman,1967 年;Liberman 和 Pierrehumbert,1984 年)。然而,这种形式的音调稳定性与语调之间的结构关系无关。
1 It has been noted that the low pitch at the end of declarative utterances is often quite stable for a given speaker, likely reflecting physiological factors (Lieberman, 1967; Liberman & Pierrehumbert, 1984). This form of pitch stability, however, has nothing to do with structural relations between intonational tones.
2捷克作曲家 Leos Janaccek(1854-1928 年)是一个有趣的例外。Janaccek 对语音旋律着迷,并在许多笔记本上写满了音调轮廓的音乐转录(Wingfield,1999;Pearl,2006;Patel,2006a)。
2 The Czech composer Leos Janaccek (1854-1928) is an interesting exception. Janaccek was fascinated by speech melodies and filled many notebooks with musical transcriptions of intonation contours (Wingfield, 1999; Pearl, 2006; Patel, 2006a).
3事实上,这种趋势甚至在非人类灵长类动物的叫声中也有所体现(Hauser & Fowler,1992 年)。然而,重要的是要注意偏角并不总是存在:它在自发演讲中的频率低于准备好的演讲 (Umeda, 1982),并且经常在问题中被抑制 (Thorsen, 1980)。然而,如文中所述,它足够频繁地产生影响句子上下文中音调感知的期望。
3 Indeed, such a tendency has even been noted in the calls of nonhuman primates (Hauser & Fowler, 1992). However, it is important to note that declination is not always present: It is less frequent in spontaneous speech than prepared speech (Umeda, 1982) and is often suppressed in questions (Thorsen, 1980). Nevertheless, it is frequent enough to create expectations that influence pitch perception in a sentence context, as described in the text.
4虽然本章将重点关注 F0 作为语音旋律的物理关联,但重要的是要注意,从心理声学的角度来看,语音中的音调感觉并非直接来自 F0,而是从 F0 的谐波的频率间隔推断出来的. 也就是说,声音的音高是感知“缺失的基本音调”的一个例子,这一点可以从以下事实中明显看出:通过电话很容易听到男性声音的音高(~100 Hz),即使电话过滤掉了低于 300 赫兹的频率。谐波 3-5 被认为在音调感知中特别重要(有关更多详细信息,请参见 Moore,1997 年)。
4 Although this chapter will focus on F0 as the physical correlate of speech melody, it is important to note that from a psychoacoustic standpoint the sensation of pitch in speech is derived not directly from F0 but is inferred from the frequency spacing of the harmonics of F0. That is, the pitch of the voice is an example of the perception of the “missing fundamental,” as is evident from the fact that the pitch of the male voice (~100 Hz) is easily heard over telephones, even though telephones filter out frequencies below 300 Hz. Harmonics 3-5 are thought to be especially important in voice pitch perception (see Moore, 1997, for more details).
5有关这种性别差异的可能进化基础的讨论,请参见 Ohala (1983, 1994)。
5 For a discussion of the possible evolutionary basis of this sex difference, see Ohala (1983, 1994).
6 Steele 对语音语调的观察,尤其是其快速变化的速度,非常有先见之明。实证研究表明,语音中音调的变化实际上可能接近其生理速度极限(Xu & Sun, 2002)。
6 Steele’s observations about speech intonation, particularly its rapid rate of change, were remarkably prescient. Empirical work suggests that the movements of voice pitch in speech may in fact be near their physiological speed limit (Xu & Sun, 2002).
8虽然本章重点介绍西欧调性音乐,但重要的是要注意音高层次结构并非这一传统所独有。某种形式的中调或主音广泛存在于不同文化的音乐旋律中,包括艺术音乐和民间音乐(例如,Herzog,1926 年;Castellano 等人,1984 年;Kessler 等人,1984 年)。
8 Although this chapter focuses on Western European tonal music, it is important to note that pitch hierarchies are not unique to this tradition. Some form of a tonal center or tonic is widespread in musical melodies of different cultures, in both in art music and folk music (e.g., Herzog, 1926; Castellano et al., 1984; Kessler et al., 1984).
9当然,语调并不是影响语音的唯一线索(例如,语音质量也起着重要作用;参见第 6 章和Ladd 等人,1985 年)。此外,在某些情况下,情感会以更明确(相对于连续)的方式影响语调模式(Ladd,1996:Ch.1)。
9 Of course, intonation is not the only cue to affect in speech (for example, voice quality also plays an important role; cf. Chapter 6 and Ladd et al., 1985). Also, there are cases in which affect influences intonation patterns in a more categorical (vs. continuous) fashion (Ladd, 1996: Ch. 1).
10弗雷德·勒达尔 (Fred Lerdahl) (2003) 阐述了少数几个联系点之一,他应用音调分析的理论机制来推导出口头诗行的语调轮廓。
10 One of the few points of contact has been articulated by Fred Lerdahl (2003), who has applied the theoretical machinery of musical pitch analysis to derive intonation contours for lines of spoken poetry.
11 IPO 方法和 AM 方法之间的一个区别是前者将音高运动视为语调的原语,而后者将音高水平视为原语。后一种观点的证据包括观察到俯仰运动的起点和终点比这些运动的持续时间或斜率更稳定 (Arvaniti et al., 1998; Ladd et al., 1999)。然而,这种观点上的差异并非不可调和:将 IPO 方法重铸为基于目标的框架相对容易,因为运动总是发生在定义明确的起点和终点之间。此外,听众可能会听到一些音调事件作为水平音高和其他作为移动的语调(参见本节稍后描述的程序图)。
11 One difference between the IPO approach and the AM approach is that the former treats pitch movements as the primitives of intonation, whereas the latter treats pitch levels as the primitives. Evidence for the latter view includes the observation that the beginning and endpoints of pitch movements are more stable than the duration or slope of these movements (Arvaniti et al., 1998; Ladd et al., 1999). This difference in perspective is not irreconcilable, however: It is relatively easy to recast the IPO approach into a target-based framework, because movements always take place between well-defined start and endpoints. Furthermore, it may be that listeners hear some pitch events in intonation as level pitches and others as movements (cf. the prosogram, described later in this section).
12用于计算图 4.4 和 4.12 中的音程图的滑音阈值是 0.32/ T 2半音/秒,其中T是元音(图 4.4中)或韵母(图 4.12中)的持续时间(以秒为单位)。如果音高变化率大于此阈值,则元音(或韵母)被分配频率滑移。该阈值的选择基于对检测语音中音高运动阈值的感知研究,并结合了将音程图输出与人类语调转录进行比较的实验 ('t Hart, 1976; Mertens, 2004b)。
12 The glissando threshold used in computing the prosograms in Figures 4.4 and 4.12 is 0.32/T2 semitones/second, in which T is the duration (in seconds) of the vowel (in Figure 4.4) or rime (in Figure 4.12). If the rate of pitch change was greater than this threshold, the vowel (or rime) was assigned a frequency glide. The choice of this threshold is based on perceptual research on the threshold for detecting pitch movements in speech, combined with experiments in which prosogram output is compared to human transcriptions of intonation (’t Hart, 1976; Mertens, 2004b).
13程序图的最新版本可以将句子自动分割成音节核心,用户无需提供元音或韵母边界。
13 A recent version of the prosogram can perform automatic segmentation of a sentence into syllabic nuclei, removing the need for the user to supply vowel or rime boundaries.
14该程序可从http://bach.arts.kuleuven.be/pmertens/proso-gram/免费获得,并在 Praat 下运行,可从http://www.fon.hum.uva 免费获得。荷兰/普拉特/。
14 The prosogram is freely available from http://bach.arts.kuleuven.be/pmertens/proso-gram/, and runs under Praat, which is freely available from http://www.fon.hum.uva. nl/praat/.
15 Hermes 和 Van Gestel (1991) 的早期工作建议 ERB 应该用于测量语音中的音高距离。语音单位的精确选择不太可能影响此处报告的结果。我们对音高可变性的测量,即变异系数 (CV),是一个无量纲的量,它允许人们以不同的单位(ERB 与半音)测量语音和音乐中的音高距离,并且仍然将跨域的可变性与 CV 进行比较一个共同的指标。
15 Earlier work by Hermes and Van Gestel (1991) had suggested ERBs should be used in measuring pitch distances in speech. The precise choice of units for speech is unlikely to influence the results reported here. Our measure of pitch variability, the coefficient of variation (CV), is a dimensionless quantity that would allow one to measure pitch distances in speech and music in different units (ERBs vs. semitones), and still compare variability across domains with the CV as a common metric.
16请注意,这项研究是在作者意识到音程图之前进行的,因此语调的音调模拟的构建是基于直观的标准,而不是 F0 程式化的明确算法。尽管如此,生成的音高序列确实与由 proso-gram 生成的序列有一些相似之处,尽管它们没有任何滑行(参见第 4.3.2 节)。
16 Note that this study was conducted before the authors were aware of the prosogram, and thus the construction of tonal analog of intonation was based on intuitive criteria rather than on an explicit algorithm for F0 stylization. Nevertheless, the resulting pitch sequences do bear some resemblance to sequences that would be generated by a proso-gram, although they lack any glides (cf. section 4.3.2).
17另一位用类似范式研究过的音乐家也表现出与语调和非语言音调序列类似物相同的困难(Nicholson 等人,2003 年)。然而,在这种情况下,损伤发生在右顶叶皮层,这可能会破坏在两个区域中提取音高模式的能力,而不是导致记忆这些模式的问题(参见 Griffiths 等人,1997)。还值得注意的是,IR 在 Patel 等人的材料上进行了重新测试。在最初的研究大约 10 年后,表现出几乎相同的模式。因此,她的病情非常稳定(Lochy & Peretz,个人通讯,2004 年)。
17 Another amusic who has been studied with a similar paradigm has also demonstrated equivalent difficulties with intonation and nonlinguistic tone sequence analogs (Nicholson et al., 2003). In this case, however, the damage was in the right parietal cortex and this may have disrupted the ability to extract pitch patterns in both domains, rather than causing a problem in remembering these patterns (cf. Griffiths et al., 1997). It is also worth noting that IR was retested on the materials of Patel et al. approximately 10 years after the original study and showed a virtually identical pattern of performance. Thus her condition is very stable (Lochy & Peretz, personal communication, 2004).
18命中被定义为被分类为不同的不同配置对,而误报被定义为被分类为不同的相同配置对。
18 A hit was defined as a different configuration pair that was classified as different, whereas a false alarm was defined as a same configuration pair classified as different.
19图 4.19 和 4.20 中的数据由 Jessica Foxton 友情提供。数据来自同一组参与者执行不同的感知任务。分析中排除了两个异常值(一个 amusic 和一个对照)。
19 Data presented in Figures 4.19 and 4.20 were kindly provided by Jessica Foxton. The data come from the same set of participants doing different perceptual tasks. Two outliers have been excluded from the analysis (one amusic and one control).
20有趣的是,在普通话中,上升和下降的词汇声调在连接语音中似乎有大约 2 个半音的下限(Xu,1994,1999),这表明这些声调将超过大多数 mTDI 的音高方向阈值。如果这个下限通常代表声调语言,则表明 mTD 在一般情况下对此类语言的语音感知应该没有什么影响。
20 Interestingly, in Mandarin, the rising and falling lexical tones appear to have a lower limit of about 2 semitones in connected speech (Xu, 1994, 1999), suggesting that these tones will exceed the pitch direction thresholds of most mTDIs. If this lower limit is representative for tone languages generally, it would suggest that mTD should have little consequence for speech perception in such languages under ordinary circumstances.
Chapter 5
Syntax
5.1 Introduction
5.2 The Structural Richness of Musical Syntax
5.2.1 Multiple Levels of Organization
Scale Structure
Chord Structure
Key Structure
5.2.2 The Hierarchical Structure of Sequences
Musical Event Hierarchies I: Structure and Ornamentation
Musical Event Hierarchies II: Tension and Resolution
Order and Meaning
5.2.3 Context Dependent Structural Functions
5.2.4 Some Final Comments on Musical Syntax
5.3 Formal Differences and Similarities Between Musical and Linguistic Syntax
5.3.1 Formal Differences
5.3.2 Formal Similarities: Hierarchical Structure
5.3.3 Formal Similarities: Logical Structure
5.3.4 Formal Differences and Similarities: Summary
5.4 Neural Resources for Syntactic Integration as a Key Link
5.4.1 Neuropsychology and Dissociation
5.4.2 Neuroimaging and Overlap
5.4.3 Using Cognitive Theory to Resolve the Paradox
Syntactic Processing in Language I: Dependency Locality Theory
Syntactic Processing in Language II: Expectancy Theory
Syntactic Processing in Music: Tonal Pitch Space Theory
Convergence Between Syntactic Processing in Language and Music
Reconciling the Paradox
5.4.4 Predictions of a Shared-Resource Hypothesis
Interference Between Linguistic and Musical Syntactic Processing
Musical Syntactic Deficits in Aphasia
5.5 Conclusion
语言和音乐句法的比较是一个引起热情和冷静怀疑的话题。前一种反应在 1973 年伦纳德伯恩斯坦在哈佛大学的一系列讲座中得到了说明,后来作为未回答的问题出版(1976)。长期对音乐结构和意义的分析感兴趣的伯恩斯坦(Bernstein,1959)从诺姆·乔姆斯基(Noam Chomsky)(1972)的生成语言学理论中获得灵感,着手在语言学框架下分析西方调性音乐的语法。作为演讲的一部分,伯恩斯坦对语言和音乐句法进行了比较。尽管他的音乐知识和个人魅力使他的演讲非常值得一看,但他的阐述细节对语言学或音乐学者来说都没有说服力。Keiler (1978) 列举了 Bernstein 方法的几个问题,其中包括名词和动词等语言部分与动机和节奏等特定音乐元素之间相当紧张的类比。
The comparison of linguistic and musical syntax is a topic that has generated both warm enthusiasm and cool skepticism. The former reaction is illustrated by a set of lectures given by Leonard Bernstein at Harvard in 1973, later published as The Unanswered Question (1976). Bernstein, who had long been interested in the analysis of musical structure and meaning (Bernstein, 1959), found inspiration in the generative linguistic theory of Noam Chomsky (1972) and set out to analyze the grammar of Western tonal music in a linguistic framework. As part of his presentation, Bernstein made comparisons between linguistic and musical syntax. Although his musical knowledge and personal charisma make his lectures well worth watching, the details of his exposition were not persuasive to scholars in either linguistics or music. Keiler (1978) has enumerated several of the problems with Bernstein’s approach, which include rather strained analogies between linguistic parts of speech such as nouns and verbs and particular musical elements such as motives and rhythms.
尽管如此,伯恩斯坦的努力对语言音乐研究产生了重要影响。由于他的讲座,1974 年秋天在麻省理工学院组织了一次关于音乐和语言学的研讨会,其中两位参与者(音乐学家 Fred Lerdahl 和语言学家 Ray Jackendoff)最终出版了音乐认知领域最具影响力的书籍之一,调性音乐的生成理论(1983)。在他们的标题中使用术语“生成”是指使用正式程序来生成给定音乐作品的结构描述。此描述着重于听众在听音乐时感知到的四种结构关系。其中两个关系涉及节奏:编组结构和韵律结构(参见第 3 章). 其他两个关系更抽象,并且涉及音调的相对结构重要性(“时间跨度减少”)和随着时间的推移紧张和放松的模式(“延长减少”)的层次结构。尽管 Lerdahl 和 Jackendoff 采用了生成语法的工具来分析音乐(参见 Sundberg & Lindblom,1976),但他们并没有关注语言和音乐句法的比较。事实上,他们对这种比较持怀疑态度,并指出“指出音乐和语言之间的表面类比,无论有没有生成语法的帮助,都是一种古老且基本上无用的游戏”(第 5 页)。为了支持他们的怀疑,他们指出了这两种句法系统之间的具体差异,包括缺少名词和动词等语言词类的音乐等价物,以及语言和音乐“句法树”构建方式的差异(参见.下文第 5.3.1 节)。
Nevertheless, Bernstein’s efforts had an important effect on language-music studies. As a result of his lectures, a seminar on music and linguistics was organized at MIT in the fall of 1974, and two of the participants (the musicologist Fred Lerdahl and the linguist Ray Jackendoff) ultimately produced one of the most influential books in music cognition, A Generative Theory of Tonal Music (1983). The use of the term “generative” in their title refers to the use of formal procedures to generate a structural description of a given musical piece. This description focuses on four types of structural relations a listener perceives when hearing music. Two of these relations concern rhythm: grouping structure and metrical structure (cf. Chapter 3). The other two relations are more abstract, and concern hierarchies in the relative structural importance of tones (“time-span reduction”) and in the patterning of tension and relaxation over time (“prolongation reduction”). Although Lerdahl and Jackendoff adapted the tools of generative grammar to analyze music (cf. Sundberg & Lindblom, 1976), they did not focus on comparisons of linguistic and musical syntax. Indeed, they were skeptical of such comparisons, noting that “pointing out superficial analogies between music and language, with or without the help of generative grammar, is an old and largely futile game” (p. 5). In support of their skepticism, they point out specific differences between the two syntactic systems, including the lack of musical equivalents for linguistic parts of speech such as nouns and verbs, and differences in the way linguistic and musical “syntactic trees” are constructed (cf. section 5.3.1 below).
尽管 Lerdahl 和 Jackendoff 持怀疑态度,但语言和音乐句法之间的比较仍然让学者们着迷。该问题的理论处理包括音乐学家和语言学家的工作(例如,Swain,1997 年;Horton,2001 年;Tojo 等人,2006 年;Pesetsky,2007 年;Rohrmeier,2007 年)。然而,可以公平地说,对于每一个热情地探讨这个话题的理论家,都有另一个人发出警告(例如,Powers,1980 年;Feld,1974 年)。这种辩证法的一个特别引人入胜的例子是著名民族音乐学学者关于爪哇加麦兰音乐的两篇文章 (Becker & Becker, 1979, 1983),
Despite Lerdahl and Jackendoff’s skepticism, comparisons between linguistic and musical syntax have continued to fascinate scholars. Theoretical treatments of the issue include work by musicologists and linguists (e.g., Swain, 1997; Horton, 2001; Tojo et al., 2006; Pesetsky, 2007; Rohrmeier, 2007). It is fair to say, however, that for each theorist who approaches the topic with enthusiasm, there is another who sounds a note of warning (e.g., Powers, 1980; Feld, 1974). A particularly fascinating example of this dialectic concerns two articles on Javanese gamelan music by leading ethnomusicological scholars (Becker & Becker, 1979, 1983), the first enthusiastically analyzing the grammar of this music in a linguistic framework and the second (by the same authors a few years later) rejecting this original approach as an exercise in empty formalisms yielding little new insight.
语言和音乐句法之间的理论比较无疑将在未来许多年继续进行,问题的两面都有声音。然而,在过去的几年里,发生了一些新的事情:认知神经科学中开始出现关于这个主题的实证研究。这个主题对现代脑科学的吸引力很容易理解。语言句法是人类思维特殊能力的象征,并被声称参与“特定领域”的认知机制(即语言独有的机制;参见 Fodor,1983)。人类思维中第二个句法系统的存在自然导致了它们之间关系的问题。它们是精神上孤立的模块化系统,还是可能存在认知和神经重叠?
Theoretical comparisons between linguistic and musical syntax will no doubt continue for many years to come, with voices on both sides of the issue. In the past few years, however, something new has happened: Empirical studies of this topic have started to emerge in cognitive neuroscience. The appeal of this topic for modern brain science is easy to understand. Linguistic syntax is emblematic of the special abilities of the human mind and has been claimed to engage “domain-specific” cognitive mechanisms (i.e., mechanisms unique to language; see Fodor, 1983). The presence of a second syntactic system in the human mind naturally leads to the question of the relation between them. Are they mentally isolated, modular systems, or might there be cognitive and neural overlap?
本章分为三个部分。第一个提供音乐语法的背景。第二部分讨论音乐句法和语言句法在形式上的异同。最后一部分讨论了神经科学揭示了大脑中音乐-语言句法关系的内容。正如我们将要看到的,有证据表明在这两个领域的句法处理中存在显着的神经重叠。此外,探索这种重叠的本质提供了一种探索人类句法能力的认知和神经基础的新方法。
This chapter is divided into three parts. The first provides background on musical syntax. The second part discusses formal differences and similarities between musical and linguistic syntax. The final part discusses what neuroscience has revealed about musical-linguistic syntactic relations in the brain. As we shall see, there is evidence for significant neural overlap in syntactic processing in the two domains. Furthermore, exploring the nature of this overlap provides a novel way to explore the cognitive and neural foundations of human syntactic abilities.
在开始之前,值得解决一个问题,什么是“音乐句法”?因为这个词对不同的学者可能意味着不同的东西。在本章中,音乐中的句法(就像语言中的句法)指的是控制离散结构元素组合成序列的原则。世界上绝大多数音乐都是句法的,这意味着人们可以识别两种感知上离散的元素(例如具有不同音高或鼓声的音调具有不同音色的声音)以及将这些元素组合成序列的规范。这些规范不是音乐家必须遵守的“规则”。相反,作曲家和表演者可以并且确实出于艺术目的故意违反这些规范。然而,这种背离之所以有意义,恰恰是因为它们有其运作所依据的规范。规范的认知意义在于它们会被听众内化,听众会产生影响他们听音乐方式的期望。因此,句法研究不仅涉及结构原则,还涉及听众用来将音乐声音组织成连贯模式的隐含知识。
Before embarking, it is worth addressing the question, what is “musical syntax?” because this term may mean different things to different scholars. In this chapter, syntax in music (just as in language) refers to the principles governing the combination of discrete structural elements into sequences. The vast majority of the world’s music is syntactic, meaning that one can identify both perceptually discrete elements (such as tones with distinct pitches or drum sounds with distinct timbres) and norms for the combination of these elements into sequences. These norms are not “rules” that musicians must obey. On the contrary, composers and performers can and do purposely contravene these norms for artistic purposes. However, such departures are meaningful precisely because there are norms against which they operate. The cognitive significance of the norms is that they become internalized by listeners, who develop expectations that influence how they hear music. Thus the study of syntax deals not only with structural principles but also with the resulting implicit knowledge a listener uses to organize musical sounds into coherent patterns.
与语言一样,音乐的句法因文化和历史时代而异。然而,与所有人类语言共享许多重要句法特征的语言不同(Van Valin,2001),音乐中的句法普遍性似乎仅限于一些非常普遍的特征,例如音乐方面的音高组织每个八度音阶(通常)有 5 到 7 个音调(参见第 2 章)。这样的共性很难为语言和音乐句法的详细比较提供基础。1个人类音乐缺乏句法统一性不足为奇。与语言不同,音乐并不局限于传递某种信息,因此至少一些人认为“音乐”的声音结构范围反映了人类审美创造力和兴趣的广泛且不断增长的多样性。
As with language, the syntax of music varies across cultures and historical eras. Unlike language, however, in which a number of important syntactic features are shared by all human languages (Van Valin, 2001), syntactic universals in music appear to be limited to a few very general features such as the organization of pitch in terms of musical scales with (typically) 5 to 7 tones per octave (cf. Chapter 2). Such universals hardly provide a basis for a detailed comparison of linguistic and musical syntax.1 This lack of syntactic unity in human music should not be surprising. Unlike language, music is not constrained to transmit a certain kind of information, so that the range of sonic structures considered “music” by at least some people reflects the vast and ever-growing diversity of human aesthetic creativity and interest.
因此,对语言和音乐句法进行有意义的比较需要关注特定时期和风格的音乐。我选择专注于西欧调性音乐(或简称“调性音乐”),这种音乐在大约 1650 年到 1900 年之间蓬勃发展,其句法惯例自那时以来一直具有影响力。(在本章中,“调性”一词有时被用作这些约定的简称。)例如,今天在欧洲和美洲听到的大多数音乐都是调性音乐。关注这一传统的另一个原因是,在所有已知的音乐系统中,它是从理论和实证角度研究最广泛的(例如,Krumhansl,1990 年;Lerdahl,2001 年)。
Meaningful comparison of linguistic and musical syntax thus requires focus on the music of a particular period and style. I have chosen to focus on Western European tonal music (or “tonal music” for short), a music that flourished between about 1650 and 1900 and whose syntactic conventions have been influential since that time. (In this chapter, the term “tonality” is sometimes used as a shorthand term for these conventions.) For example, most of the music heard in Europe and the Americas today is tonal music. Another reason to focus on this tradition is that of all known musical systems, it is the most extensively studied from both a theoretical and an empirical perspective (e.g., Krumhansl, 1990; Lerdahl, 2001).
音乐句法真的值得与语言句法进行比较吗?非语言系统是句法的这一简单事实并不能保证与语言进行有趣的比较。例如,沼泽麻雀的歌声Melospiza georgiana由一些声学上离散的元素(“音符”)组成,不同的地理人群以不同的方式排列这些音符以形成更大的块(“音节”),这些块会及时重复以创作歌曲(图 5.1)。
Does musical syntax really merit comparison with linguistic syntax? The simple fact that a nonlinguistic system is syntactic does not guarantee an interesting comparison with language. For example, the songs of the swamp sparrow Melospiza georgiana are made up of a few acoustically discrete elements (“notes”), and different geographic populations order these notes in different ways to form larger chunks (“syllables”) that are repeated in time to create a song (Figure 5.1).
优雅的实验表明,这些句法差异是习得的,对鸟类有意义。实际上,它们是鸟类用来识别潜在竞争者或配偶的地理歌曲“方言”的基础(Balaban,1988 年;参见 Thompson & Bakery,1993 年)。然而,这样的系统很难与语言句法进行有意义的比较。语言句法以其结构丰富而著称,其复杂程度使其有别于任何已知的非人类交流系统。
Elegant experiments have shown that these syntactic differences are learned and are meaningful to the birds. Indeed, they serve as the basis of geographic song “dialects” that the birds use in identifying potential competitors or mates (Balaban, 1988; cf. Thompson & Bakery, 1993). Yet such a system can hardly sustain a meaningful comparison with linguistic syntax. Linguistic syntax is remarkable for its structural richness, attaining a level of complexity that sets it apart from any known nonhuman communication system.
这种丰富性的一个方面是多层组织。从有意义的子单元或“语素”(例如英语中使用后缀“-ed”来形成常规过去时)形成单词有一些原则,从单词(例如名词短语)形成短语和介词短语),以及从短语构成句子。此外,句子构成包括递归结构原则(例如将一个名词短语嵌入另一个名词短语),这些原则似乎将人类语言与非人类动物交流系统区分开来(Hauser 等人,2002 年;尽管参见 Gentner 等人,2006 年)。
One aspect of this richness is multilayered organization. There are principles for the formation of words from meaningful subunits, or “morphemes” (such as the use of the suffix “-ed” in English to form the regular past tense), for the formation of phrases from words (such as noun phrases and prepositional phrases), and for the formation of sentences from phrases. Furthermore, sentence formation includes principles of recursive structure (such as embedding one noun phrase within another) that appear to set human language apart from nonhuman animal communication systems (Hauser et al., 2002; though see Gentner et al., 2006).
图 5.1沼泽麻雀歌曲组织。(a) 构成沼泽麻雀歌曲的六类最小声学元素(音符)的示例。左边的刻度代表频率(1-8KHz);底部的时间刻度条为 100 毫秒。(b) 两个到六个(最常见的是三个或四个)音符放在一起形成音节。显示了来自两首不同的纽约歌曲的两个音节。特定地理位置的鸟类偏好将特定音符放在音节的特定位置;这构成了一首歌的句法。(c) 重复沼泽麻雀的音节,形成一首约 2 秒的歌曲。此处描绘的两首歌曲由 (b) 中详述的两个音节的重复组成。来自巴拉班,1988 年。
Figure 5.1 Swamp sparrow song organization. (a) Examples of the six categories of minimal acoustic elements (notes) that make up swamp sparrow songs. Scale at the left represents frequency (1–8KHz); time scale bar on the bottom is 100 ms. (b) From two to six (most commonly three or four) notes are put together to form syllables. Two syllables from two different New York songs are shown. Birds in a given geographic location have preferences for placing certain notes in certain positions in a syllable; this constitutes the syntax of a song. (c) Swamp sparrow syllables are repeated to form a ~2-sec song. The two songs depicted here consist of repetitions of the two syllables detailed in (b). From Balaban, 1988.
语言句法结构丰富的另一个方面是句法和意义之间的密切关系,因此单词顺序(和/或语法语素的同一性)的变化可以极大地改变话语的含义。例如,“The man with the thin cane saw the girl”与“The thin girl with the cane saw the man”的意思完全不同。Peter Marler (2000) 指出,人类句法的这一方面使其有别于其他脊椎动物句法系统,例如鸟鸣和鲸鱼鸣叫,在这些系统中,序列的含义与元素出现的顺序并没有错综复杂的关系。相反,目前的证据表明,这些非人类的声音表现总是意味着同一件事:领土警告和性广告。在这些简单的句法系统中,
Another aspect of the structural richness of linguistic syntax is the strong relationship between syntax and meaning, so that changes in the order of words (and/or the identity of grammatical morphemes) can greatly alter the meaning of an utterance. For example, “The man with the thin cane saw the girl” means something quite different from “The thin girl with the cane saw the man.” Peter Marler (2000) has pointed out that this aspect of human syntax sets it apart from other vertebrate syntactic systems such as bird song and whale song, in which the meaning of the sequence is not intricately related to the order in which the elements occur. Instead, current evidence suggests that these nonhuman vocal displays always mean the same thing: territorial warning and sexual advertisement. In these simple syntactic systems, the order of the elements simply identifies the caller as a member of a particular species or group.
语言句法丰富性的第三个也是非常重要的方面是,单词可以具有抽象的语法功能(例如主语、直接宾语和间接宾语),这些功能是由它们的上下文和结构关系决定的,而不是由单词的固有属性决定的。词本身 (Jackendoff, 2002)。例如,“猫”这个词并没有使它成为主语、宾语或间接宾语,但在句子上下文中它可以承担这些功能之一,并因此在其他位置触发句法现象,例如数字的主谓一致。
A third and very important aspect of the richness of linguistic syntax is the fact that words can take on abstract grammatical functions (such as subject, direct object, and indirect object) that are determined by their context and structural relations rather than by inherent properties of the words themselves (Jackendoff, 2002). For example, there is nothing about the word “cat” that makes it a subject, object, or indirect object, yet in a sentence context it can take on one of these functions, and as a consequence trigger syntactic phenomena at other locations such as subject-verb agreement for number.
接下来的四个部分讨论了调性音乐的句法,说明音乐结构具有许多使语言句法如此丰富的关键特征。
The next four sections discuss the syntax of tonal music, illustrating that musical structure has many of the key features that make linguistic syntax so rich.
和语言一样,调性音乐在多个层次上都有句法原则。以下部分重点介绍三个层次的推介组织。作为背景,回想一下第 2 章,调性音乐的基本音高材料是从物理频率的连续体中有序提取的。具体来说,每个八度音阶(频率加倍)包含 12 个音调,因此每个音调与其下方音调之间的频率比是恒定的。这个称为半音的基本比率约为 6%,等于钢琴上相邻的黑白键之间的音高距离。还记得第 2 章调性音乐表现出八度等价性,其中基频以 2/1 比率相关的音高被认为在音高上高度相似,因此无论它们出现在哪个八度音阶中,都被赋予相同的字母名称或音级。这 12 个音级被赋予字母名称:A, A ♯ B, C, C ♯ D, D ♯ E, F, F ♯ G, G ♯其中♯ = “sharp”(或等效地,A, B ♭ , B , C, D ♭ , D, E ♭ , E, F, G ♭ , G, A ♭其中♭ = “平坦”)。出现音符的八度由数字表示在其音级名称之后,例如,220 Hz 对应于 A3,而 440 Hz 对应于 A4。
Like language, tonal music has syntactic principles at multiple levels. The following sections focus on three levels of pitch organization. As background, recall from Chapter 2 that the basic pitch materials of tonal music are drawn in an orderly way from the continuum of physical frequencies. Specifically, each octave (doubling in frequency) contains 12 tones such that the frequency ratio between each tone and the tone below it is constant. This basic ratio, called the semitone, is about 6%, equal to the pitch distance between an adjacent black and white key on a piano. Also recall from Chapter 2 that tonal music exhibits octave equivalence, whereby pitches whose fundamental frequencies are related by a 2/1 ratio are perceived as highly similar in pitch and are thus given the same letter name or pitch class irrespective of the octave in which they occur. These 12 pitch classes are given letter names: A, A♯ B, C, C♯ D, D♯ E, F, F♯ G, G♯ in which ♯ = “sharp” (or equivalently, A, B♭, B, C, D♭, D, E♭, E, F, G♭, G, A ♭ in which ♭ = “flat”). The octave in which a note occurs is indicated by a number after its pitch class name, for example, 220 Hz corresponds to A3, whereas 440 Hz is A4.
音调的最基本句法组织涉及音阶(参见第 2章2.2.2 节的音阶背景)。在调性音乐中,在任何给定时刻播放的音高并不是均匀地分布在每个八度音阶的 12 个可能的音高等级中,而是受到音阶的限制,每个八度音阶的 7 个音调(或“音阶度”)的子集具有不对称模式它们之间的音高间距(“间隔”)。一种这样的音阶(C 大调音阶)如图 5.2所示。
The most basic level of syntactic organization of pitch concerns musical scales (cf. Chapter 2, section 2.2.2 for background on musical scales). In tonal music the pitches played at any given moment are not uniformly distributed across the 12 possible pitch classes per octave but are instead constrained by a musical scale, a subset of 7 tones (or “scale degrees”) per octave with an asymmetric pattern of pitch spacing (“intervals”) between them. One such scale (the C major scale) is shown in Figure 5.2.
如第 4 章(第 4.2.6 节,应作为本节的背景阅读)所述,在音乐环境中,不同的音阶音调在音乐结构中扮演不同的角色,其中一个音调在结构上最重要和稳定(“补品”)。使用一个音高作为调性中心并不局限于西欧调性音乐,而是在各种音乐传统中反复出现。这表明围绕补品的音调组织可能适合人类思维,或许反映了心理参考点在组织心理类别2中的效用(Rosch,1975;Krumhansl,1979;cf. Justus & Hutsler,2005)。
As noted in Chapter 4 (section 4.2.6, which should be read as background for this section), in a musical context, the different scale tones take on different roles in the fabric of the music, with one tone being the structurally most central and stable (the “tonic”). The use of one pitch as a tonal center is not restricted to Western European tonal music, but appears repeatedly in diverse musical traditions. This suggests that the organization of pitch around a tonic may be congenial to the human mind, perhaps reflecting the utility of psychological reference points in organizing mental categories2 (Rosch, 1975; Krumhansl, 1979; cf. Justus & Hutsler, 2005).
调性音乐中音阶结构的一个有趣方面是,听众对音阶稳定性的直觉不是简单的二元关系,即主音稳定而所有其他音调同样不稳定。相反,经验证据表明存在稳定性等级。这种层次结构的一个重要方面是主音与其相邻音阶(音阶 2 和 7)之间的稳定性对比,这会产生对音调中心的心理拉动。这反映在第 2 级和第 7 级音阶的音乐理论名称中:第二级称为“超主音”(即主音之上的音调),第七级称为“主音”(即,导致补品的音调)。在早期的一组研究中,
An interesting aspect of scale structure in tonal music is that listeners’ intuitions of the stability of scale degrees is not simply binary, with the tonic being stable and all other tones being equally less stable. Instead, empirical evidence suggests that there is a hierarchy of stability. An important aspect of this hierarchy is the contrast in stability between the tonic and its neighboring scale tones (scale degrees 2 and 7), which creates a psychological pull toward the tonal center. This is reflected in the music-theoretic names for the 2nd and 7th scale degrees: The second is called the “supertonic” (i.e., the tone just above the tonic), and the seventh is known as the “leading tone” (i.e., the tone that leads to the tonic). In an early set of studies, Robert Francès (1988) provided evidence for the “pull to the tonic” by demonstrating that listeners were less sensitive to upward mistunings of the leading tone when it was in an ascending melodic context, in other words, when the mistuning brought it closer to the tonic.
音阶结构心理表征的另一种方法涉及音阶中不同音调之间的感知相关性(或心理距离)。Krumhansl (1979) 使用一个范例来探讨这个问题,在这个范例中,听众首先听到音调背景(例如,升序或降序的 C 大调音阶),然后从该音阶中听到两个比较音调。任务是判断第一个声调与上下文所暗示的声调系统中的第二个声调的相关程度。使用多维尺度分析了该任务的结果。这种技术将相关性的判断转化为空间显示,因此感知到的相关性越接近,结果图中的元素就越接近。相似度评分的三维解如图 5.3所示。
Another approach to the mental representation of scale structure concerns the perceived relatedness (or mental distance) between different tones in a scale. Krumhansl (1979) explored this issue using a paradigm in which listeners first heard a tonal context (e.g., an ascending or descending C major scale) and then heard two comparison tones from the scale. The task was to judge how closely related the first tone was to the second tone in the tonal system suggested by the context. The results of this task were analyzed using multidimensional scaling. This technique translates judgments of relatedness into a spatial display so that the closer the perceived relatedness, the closer the elements are in the resulting graph. The three-dimensional solution for the similarity ratings is shown in Figure 5.3.
图 5.2 C 大调音阶。五线谱上音符之间的小数字表示以半音为单位的音高间隔大小。(2 st = 大二度,1 st = 小二度。)音程模式 [2 2 1 2 2 2 1] 定义大调音阶。C' 是比 C 高一个八度的音高。修改自 Cuddy et al., 1981。
Figure 5.2 The C-major musical scale. The small numerals between the notes on the musical staff indicate the size of pitch intervals in semitones. (2 st = a major second, 1 st = a minor second.) The interval pattern [2 2 1 2 2 2 1] defines a major scale. C′ is the pitch one octave above C. Modified from Cuddy et al., 1981.
可以看出,C 大调音阶(C、E 和 G)的音阶 1、3 和 5 被认为是密切相关的,而其余音阶音的相关性较低,非音阶音的相关性较远。该图的一个非常显着的特征是将频率相邻的音调分开的距离很大,例如 C 和C♯。音高的生理和心理接近度之间的这种对比很可能是激发调性音乐活力的部分原因。
As can be seen, scale degrees 1, 3, and 5 of the C major scale (C, E, and G) are perceived as closely related, whereas the remaining scale tones are less closely related, and nonscale tones are distantly related. One very notable feature of this figure is the large distance that separates tones that are adjacent in frequency, such as C and C♯. This contrast between the physical and psychological proximity of pitches is likely to be part of what animates tonal music.
听众似乎对调性音乐中的音阶结构非常敏感,正如第 4 章第4.2.6 节所讨论的熟悉的“酸音”现象所证明的那样。这种敏感性在什么年龄出现?Trainor 和 Trehub (1992) 检查了 8 个月大的婴儿和非音乐家成年人检测重复 10 音符旋律中两种类型变化的能力。(旋律在每次重复时都被移调,因此任务涉及辨别音高关系的变化而不是简单地检测绝对音高变化。)在这两种情况下,旋律中间的一个音符发生了变化:在一种情况下,它提高了四个半音,但仍保持在旋律的音阶内;在另一种情况下,它只提高了一个半音,但现在却偏离了旋律的音阶。因此,这两种变化巧妙地将物理距离与规模成员资格进行了对比。婴儿同样能很好地检测到这两种变化。
Listeners appear to be quite sensitive to scale structure in tonal music, as evidenced by the familiar phenomenon of the “sour note,” as discussed in Chapter 4, section 4.2.6. At what age does this sensitivity emerge? Trainor and Trehub (1992) examined 8-month-old infants’ and nonmusician adults’ ability to detect two types of changes in a repeating 10-note melody. (The melody was transposed on each repetition, so the task involved discerning a change in pitch relationships rather than simply detecting an absolute pitch change.) In both cases, one note in the middle of the melody was changed: In one case, it was raised by four semitones, but remained within the scale of the melody; in another case, it was raised by just one semitone, but now departed from the scale of the melody. Thus the two changes cleverly pitted physical distance against scale membership. Infants detected both kinds of changes equally well.
图 5.3音调上下文中音调之间感知相似性的几何表示。数据面向 C 大调音阶,其中 C 作为主音。C' 是比 C 高一个八度的音高。来自 Krumhansl,1979 年。
Figure 5.3 Geometrical representation of perceived similarity between musical pitches in a tonal context. The data are oriented toward the C major scale, in which C serves as the tonic. C’ is the pitch one octave above C. From Krumhansl, 1979.
成人的整体表现优于婴儿,但至关重要的是,他们检测到违反尺度结构的变化明显优于尺度内变化,尽管前者的物理变化比后者小。这反映了这样一个事实,即对于成年人来说,非音阶音调“突然出现”为酸音。这些结果表明,婴儿还没有像人们预期的那样发展出尺度结构的内隐知识。有趣的是,婴儿在 10 个月大时就已经开始习得语言的声音类别(例如,他们母语的元音;Kuhl 等人,1992 年),这可能反映出婴儿的语言输入量比音乐输入量大到那个年纪都经历过。
Adults performed better than infants overall, but crucially, they detected the change that violated scale structure significantly better than the within-scale change, even though the former change was a smaller physical change than the latter. This reflects the fact that for the adults the nonscale tone “popped out” as a sour note. These results show that the infants had not yet developed an implicit knowledge of scale structure, as one might expect. It is interesting to note that infants have already started acquiring learned sound categories for language at 10 months (e.g., the vowels of their native language; Kuhl et al., 1992), which may reflect the greater amount of linguistic versus musical input that infants have experienced by that age.
可以证明对量表成员的敏感性的最早年龄是多少?在使用非常相似的范例的后续研究中,Trainor 和 Trehub (1994) 表明,没有接受过正规音乐训练的 5 岁儿童比音阶内的变化更能检测出音阶外的旋律变化,即使前者的变化在物理上比后者小。3这自然而然地提出了尺度成员敏感性个体发育的问题在 10 个月到 5 年之间。由于对幼儿进行行为测试可能很困难,因此在此类研究中使用事件相关脑电位 (ERP) 可能更可取。ERP 不需要听者的行为反应,并且已经在成人中观察到对超出音阶音符的明显 ERP 反应(例如,Besson & Faïta,1995)。
What is the earliest age that sensitivity to scale membership can be demonstrated? In a follow-up study using a very similar paradigm, Trainor and Trehub (1994) showed that 5-year-old children with no formal training in music detected out-of-scale melodic alterations better than within-scale alterations, even though the former changes were physically smaller than the latter.3 This naturally raises the question of the ontogeny of scale-membership sensitivity between 10 months and 5 years. Because behavioral tests with young children can be difficult, it may be preferable to use event-related brain potentials (ERPs) in such studies. ERPs do not require a behavioral response from the listener, and distinct ERP responses to out-of-scale notes have been observed in adults (e.g., Besson & Faïta, 1995).
调性音乐句法的一个非常重要的方面是将音阶音调同时组合成和弦,从而创造和声。和弦以有原则的方式形成:基本的“三和弦”是由音乐三度分隔的音阶构成的,换句话说,由两个音阶的距离构成。由于西方音阶的不对称音程结构,两个音阶的距离可以对应三个或四个半音的距离,换句话说,对应于小三度或大三度。例如,在 C 大调音阶中(参见图 5.2),和弦 CEG 由 C 和 E 之间的大三度和 E 和 G 之间的小三度组成,而 DFA 由 D 和 F 之间的小三度和大调组成F 和 A 之间的三和弦。这两个和弦分别代表“大调”和“小调”三和弦。如图所示图 5.4,从大音阶构建三元组产生三个主要三元组(基于音阶 1、4 和 5)、三个小三元组(基于音阶 2、3 和 6)和一个“减少的”三元组其中两个音程都是小三分之一(基于音阶 7)。
A very important aspect of tonal music’s syntax is the simultaneous combination of scale tones into chords, creating harmony. Chords are formed in principled ways: basic “triads” are built from scale degrees separated by musical thirds, in other words, by a distance of two scale steps. Because of the asymmetric interval structure of Western scales, a distance of two scale steps can correspond to a distance of either three or four semitones, in other words, to a minor or major third. For example, in the C major scale (cf. Figure 5.2), the chord C-E-G consists of a major third between C and E and a minor third between E and G, whereas D-F-A consists of a minor third between D and F and a major third between F and A. These two chords represent “major” and “minor” triads, respectively. As shown in Figure 5.4, building triads from a major scale results in three major triads (built on scale degrees, 1, 4, and 5), three minor triads (built on scale degrees 2, 3, and 6), and one “diminished” triad in which both intervals are minor thirds (built on scale degree 7).
在和弦句法中,每个和弦的一个音作为其“根音”或结构上最重要的音高。这是图 5.4中每个三和弦中最低的音符,也是给和弦命名及其罗马数字谐波标签的音符。例如,在图 5.4中,根音为 E(第 3 个音阶)的和弦是 E 小和弦 (EGB),泛音标签为“iii”。(使用小写罗马数字表示这个和弦是小和弦。)类似地,在图 5.4中,根音为G(五音阶)的和弦是G大调和弦(GBD),和声标签为V。(使用大写罗马数字表示该和弦是大和弦。)当三和弦的音符以不同的垂直顺序出现时,根音和谐波标签保持不变,并且和弦被视为具有相同的基本和声状态。因此,CEG 和 GEC 都以 C 为根音,和谐地标记为“I”;后者被简单地认为是 CEG 和弦的“反转”。
In chordal syntax, one tone of each chord acts as its “root” or structurally most significant pitch. This is the lowest note in each triad in Figure 5.4, and is the note that gives the chord its name as well as its Roman numeral harmonic label. For example, in Figure 5.4, the chord with a root of E (the 3rd scale degree) is an E-minor chord (E-G-B), with a harmonic label of “iii.” (The use of a lower case roman numeral indicates that this chord is a minor chord.) Similarly, in Figure 5.4, the chord with a root of G (the 5th scale degree) is a G major chord (G-B-D), with a harmonic label of V. (The use of an upper case roman numeral indicates that this chord is a major chord.) Even when the notes of a triad occur in a different vertical ordering, the root and harmonic label remain the same, and the chord is treated as having the same basic harmonic status. Thus C-E-G and G-E-C both have C as the root and harmonically labeled as “I”; the latter is simply considered an “inversion” of the C-E-G chord.
图 5.4 C 大调音阶的基本三元和弦。给定和弦的音符之间的小数字表示音符之间的半音间隔。大调、小调和减和弦在“和弦名称”和“和声标签”行中用字体表示:大写 = 大调,小写 = 小调,带有“ o ”上标的小写 = 减调。如果 V 和弦顶部括号中的音符包含在和弦中,则创建七和弦(在这种情况下为 V7 或 G7)。修改自 Cuddy 等人,1981 年。
Figure 5.4 Basic triadic chords of the C-major musical scale. Small numbers between the notes of a given chord indicate the interval in semitones between notes. Major, minor, and diminished chords are indicated in the “chord names” and “harmonic labels” lines by fonts: uppercase = major, lower case = minor, lower case with “o” superscript = diminished. The musical note in parentheses at the top of the V chord, if included in the chord, creates a seventh chord (V7, or G7 in this case). Modified from Cuddy et al., 1981.
和弦句法还包括用附加音调修饰三重奏的原则。例如,一个非常常见的修改是在三和弦中添加第四音,将其转换为“七和弦”,之所以这样称呼是因为添加的音比和弦根音高七个音阶。例如,在 C 大调音阶中,和弦 GBDF 将是建立在根音 G 或 G7 上的七和弦(参见图 5.4),其和声标签将是 V7。七和弦在和弦进行中起着重要的作用,它暗示着向前运动到尚未达到的静止点。
Chord syntax also includes principles for modifying triads with additional tones. For example, one very common modification is to add a fourth tone to a triad to convert it to a “seventh” chord, so called because the added tone is seven scale steps above the root of the chord. For example, in a C major scale, the chord G-B-D-F would be a seventh chord built on the root G, or G7 (cf. Figure 5.4), and its harmonic label would be V7. Seventh chords play an important role in chord progressions, by implying forward motion toward a point of rest that has not yet been reached.
上面对和弦句法的讨论涉及音乐中音调的“垂直”组织。和弦句法的另一个重要方面涉及和弦在时间上的“水平”模式。在调性音乐中,有关于和弦如何相互跟随的规范(Piston,1987 年;Huron,2006 年),这些规范在控制音乐乐句的进展感和结束感方面发挥着作用。一个典型的例子是“节奏”,这是音乐中的和声休止点。“真实的节奏”包括从 V 和弦(或 V7 和弦)到 I 和弦的移动,并带来一种平静的感觉。超越这种简单的双和弦进行,一些较长的和弦进行可以被识别为调性音乐的原型,例如 IVI、I-IV-VI、I-ii-VI 等。这些进行背后的控制模式之一是和弦的“五度循环”,连续和弦的根音由降五度相关的序列。总的来说,进展是 I-IV-vii°-iii-vi-ii-VI。Smith 和 Melara (1990) 表明,即使是音乐新手也对和弦进行中的句法原型很敏感,表明这些进行的隐含知识在听众中很普遍。
The above discussion of chord syntax concerns the “vertical” organization of tones in music. Another important aspect of chord syntax concerns the “horizontal” patterning of chords in time. In tonal music, there are norms for how chords follow one another (Piston, 1987; Huron, 2006), and these norms play a role in governing the sense of progress and closure in musical phrases. A prime example of this is the “cadence,” a harmonic resting point in music. An “authentic cadence” involves movement from a V chord (or a V7 chord) to a I chord and leads to a sense of repose. Moving beyond this simple two-chord progression, some longer chord progressions can be identified as prototypical in tonal music, such as I-V-I, I-IV-V-I, I-ii-V-I, and so on. One of the governing patterns behind these progressions is the “cycle of fifths” for chords, a sequence in which the roots of successive chords are related by descending fifths. In its entirety, the progression is I-IV-vii°-iii-vi-ii-V-I. Smith and Melara (1990) have shown that even musical novices are sensitive to syntactic prototypicality in chord progressions, showing that implicit knowledge of these progressions is widespread among listeners.
和弦序列在旋律感知中也很重要,其中和弦由重要的旋律音调暗示,而不是明确地作为音调的同时演奏(参见第 4 章,第 4.2.8 节)。听众对这种隐含的和谐很敏感。卡迪等人。(1981) 表明,暗示原型和弦序列的旋律序列比其他序列更容易记住。此外,Trainor 和 Trehub (1994) 表明,音乐上未经选择的成年人对违反隐含和声的旋律变化比对保持在隐含和声内的身体上更大的变化更敏感(参见 Holleran 等人,1995)。
Chord sequences are also important in melody perception, in which the chords are implied by important melody tones rather than explicitly played as simultaneities of tones (cf. Chapter 4, section 4.2.8). Listeners are sensitive to this implied harmony. Cuddy et al. (1981) have shown that melodic sequences that imply prototypical chord sequences are better remembered than other sequences. Furthermore, Trainor and Trehub (1994) have shown that musically unselected adults are more sensitive to melodic changes that violate the implied harmony than to physically larger changes that remain within the implied harmony (cf. Holleran et al., 1995).
就像音阶的音调一样,建立在 7 个不同音阶上的不同和弦在音乐环境中并不平等。相反,一个和弦(主和弦,建立在第一个音阶上)是最中心的,其次是属和弦(建立在第 5 个音阶上)和次属和弦(建立在第 4 个音阶上)。主音、次属和属和弦(和声标记为 I、IV 和 V 和弦)在结构上的重要性的非正式证据来自这样一个事实,即许多流行歌曲和民歌都可以仅使用这三个和弦作为基本和声来演奏。更正式的证据来自 Krumhansl 等人的一项研究。(1982),其中音乐背景(升序)后跟两个目标和弦。听众被要求判断第二个和弦在前面音阶的背景下跟随第一个和弦的程度。然后对判断进行多维缩放,以将感知到的相关性表示为空间接近度。图 5.5显示了多维缩放解决方案,并显示和弦 I、IV 和 V 形成一个中心簇,其他和弦围绕其排列。
Like the tones of the scale, different chords built on the 7 different scale degrees are not equal players in musical contexts. Instead, one chord (the tonic chord, built on the 1st scale degree) is the most central, followed by the dominant chord (built on the 5th scale degree) and the subdominant chord (built on the 4th scale degree). Informal evidence for the structural importance of the tonic, subdominant, and dominant chords (harmonically labeled as I, IV, and V chords) comes from the fact that many popular and folk songs can be played using just these three chords as the underlying harmony. More formal evidence comes from a study by Krumhansl et al. (1982) in which a musical context (an ascending scale) was followed by two target chords. Listeners were asked to judge how well the second chord followed the first in the context of the preceding scale. The judgments were then subject to multidimensional scaling in order to represent perceived relatedness as spatial proximity. Figure 5.5 shows the multidimensional scaling solution, and reveals that chords I, IV, and V form a central cluster around which the other chords are arrayed.
图 5.5音乐背景下不同和弦的心理相关性。和弦由它们的和声标签表示,大写罗马数字以通用方式使用(即不区分大调、小调和减和弦)。来自 Krumhansl 等人,1982 年。
Figure 5.5 Psychological relatedness of different chords in a musical context. Chords are indicated by their harmonic labels, with uppercase Roman numerals used in a generic fashion (i.e., major, minor, and diminished chords are not distinguished). From Krumhansl et al., 1982.
一个音阶及其音调等级,加上它的和弦系统和和弦关系,定义了西欧音乐中的一个“调”或音调区域。因为调性音乐有12个音级,每个音级都可以作为一个音阶的主音,又因为有两种常用的音阶结构(大调和小调),所以调性音乐有24个键。调以主音和音阶结构命名,例如,C 大调、B 小调。因此键和音阶的命名方式相似,这可能是人们有时会混淆两者的原因之一。
A scale and its tonal hierarchy, plus its system of chords and chord relations, defines a “key” or tonal region in Western European music. Because there are 12 pitch classes in tonal music, each of which can serve as the tonic of a scale, and because there are two commonly used scale structures (major and minor), there are 24 keys in tonal music. Keys are named for their principal note and their scale structure, for example, C major, B minor. Thus keys and scales are named in a similar way, which may be one reason that people sometimes confuse the two.
在作曲过程中,大量的音调音乐在琴键之间移动。这些键的“调制”允许作曲家探索不同的音调区域,并为乐曲勾勒出的音调之旅增添多样性。音调音乐的部分句法是音乐中键移动的模式,这远非在 24 个可能的键之间随机游走。相反,调制往往发生在相关键之间,其中相关性以特定方式定义。如果大调共享许多基本音阶音调,则它们被认为是密切相关的。比如C大调音阶(C、D、E、F、G、A、B、C)和G大调音阶(G、A、B、C、D、E、F)♯G) 仅在一个音级方面有所不同。(回想一下,通过从一个音符开始并根据大音阶音程模式 [2 2 1 2 2 2 1] 选择后续音调来获得大音阶。)推广这种关系,任何两个键的第一个音阶度由音乐五度是密切相关的,因为它们的音阶共享所有音阶,但只有一个音阶。这种关系模式可以表示为大调的“五度圈”(图 5.6)。
A great deal of tonal music moves between keys during the course of a composition. These “modulations” of key allow a composer to explore different tonal regions and add diversity to the tonal journey outlined by a piece. Part of the syntax of tonal music is the pattern of key movement in music, which is far from a random walk between the 24 possible keys. Instead, modulations tend to occur between related keys, in which relatedness is defined in particular ways. Major keys are considered closely related if they share many of their basic scale tones. For example, the notes of the C major scale (C, D, E, F, G, A, B, C) and the G major scale (G, A, B, C, D, E, F♯ G) differ only in terms of one pitch class. (Recall that a major scale is obtained by starting on one note and choosing subsequent tones according to the major scale interval pattern [2 2 1 2 2 2 1].) Generalizing this relationship, any two keys whose 1st scale degrees are separated by a musical fifth are closely related, because their scales share all but one pitch class. This pattern of relations can be represented as a “circle of fifths” for major keys (Figure 5.6).
音乐理论还表明,每个大调也与两个不同的小调密切相关。一个是“相对小调”,它具有相同的音阶音符但具有不同的主音。例如,A小调(A、B、C、D、E、F、G、A)是C大调的关系小调,因为它具有相同的音级。(回想一下,小音阶是通过从一个音符开始并按照 [2 1 2 2 1 2 2] 的小音阶音程模式选择后续音符获得的。)与给定大调相关的另一个小调是“平行小调”,它共享相同的主音,但音阶不同。因此 C 小调 (C, D, E ♭ , F, G, A ♭ , B ♭ , C) 是 C 大调的平行小调。
Music theory also suggests that each major key is also closely related to two different minor keys. One is the “relative minor,” which shares the same notes of the scale but has a different tonic. For example, A-minor (A, B, C, D, E, F, G, A) is the relative minor of C major, because it has all the same pitch classes. (Recall that the minor scale is obtained by starting on one note and choosing subsequent notes by following the minor-scale interval pattern of [2 1 2 2 1 2 2].) The other minor key related to a given major key is the “parallel minor,” which shares the same tonic but has different scale tones. Thus C-minor (C, D, E♭, F, G, A♭, B♭, C) is the parallel minor of C major.
图 5.6大调的五度圈。每个键都由一个代表其主音的字母表示。圆上相邻的键共享除一个音级外的所有音级。
Figure 5.6 The circle of fifths for major keys. Each key is represented by a letter standing for its tonic. Keys that are adjacent on the circle share all but one pitch class.
表示音乐键之间这种关系模式的一种方法是通过几何图,其中键之间的心理距离由空间距离反映。图 5.7显示了 Krumhansl 和 Kessler (1982) 在感知实验的基础上提出的一个这样的图。此二维图的一个重要方面是左右边缘是等效的,顶部和底部边缘也是如此。也就是说,地图实际上是一个在两个维度上都是圆形(环面)的形状的展开版本,反映了感知到的关键关系的圆形性质。
One way to represent this pattern of relationship among musical keys is via a geometric diagram in which psychological distance between keys is reflected by spatial distance. One such diagram, proposed by Krumhansl and Kessler (1982) on the basis of perceptual experiments, is shown in Figure 5.7. An important aspect of this two-dimensional diagram is that the left and right edges are equivalent, as are the top and bottom edges. That is, the map is actually an unfolded version of a shape that is circular in both dimensions (a torus), reflecting the circular nature of perceived key relations.
关键距离隐含知识的一种有趣证据形式来自实验,在实验中,听众听到一段旋律,然后是同一旋律的移调版本,并且必须判断这两个旋律是相同还是不同。许多研究人员发现,如果将旋律移调到附近的调而不是远调,则此任务的性能会更好(例如,Cuddy 等人,1981 年;Trainor 和 Trehub,1993 年;参见 Thompson 和 Cuddy,1992 年)。关键结构的隐性知识也有神经证据。当聆听特定键的和弦序列时,来自远处键的“外来”和弦产生比来自附近键的外来和弦更大的 P600(由结构不协调引起的与事件相关的脑电势),即使两个和弦都包含相同数量的跑调音符 (Patel, Gibson, et al., 1998)。Janata 等人。(2002) 还使用功能磁共振成像 (fMRI) 技术为大脑关键距离图提供了证据。
An interesting form of evidence for implicit knowledge of key distances comes from experiments in which listeners hear a melody followed by a transposed version of the same melody and must judge whether the two melodies are the same or different. A number of researchers have found that performance on this task is better if the melody is transposed to nearby versus a distant key (e.g., Cuddy et al., 1981; Trainor & Trehub, 1993; cf. Thompson & Cuddy, 1992). There is also neural evidence for implicit knowledge of key structure. When listening to a chord sequence in a particular key, an “alien” chord from a distant key produces a larger P600 (an event-related brain potential elicited by structural incongruity) than an alien chord from a nearby key, even when both chords contain the same number of out-of-key notes (Patel, Gibson, et al., 1998). Janata et al. (2002) have also provided evidence for maps of key distance in the brain, using the technique of functional magnetic resonance imaging (fMRI).
图 5.7音乐键之间的心理距离图。主键用大写字母表示,次键用小写字母表示。从 C 大调的键延伸的虚线表示相关的键:沿着五度圈的两个相邻的大键(G 和 F;参见图 5.6)和两个相关的小键(详见文本)。修改自 Krumhansl & Kessler, 1982。
Figure 5.7 A map of psychological distances between musical keys. Major keys are indicated by uppercase letters and minor keys by lowercase letters. Dashed lines extending from the key of C major indicate related keys: two adjacent major keys along the circle of fifths (G and F; cf. Figure 5.6) and two related minor keys (see text for details). Modified from Krumhansl & Kessler, 1982.
语言句法的主要特征之一是单词之间的关系不仅仅基于最近邻关系。例如,考虑一下“吻男孩的女孩打开了门”这句话。虽然句子中包含“男孩打开门”的单词序列,但说英语的人知道男孩并没有开门。这是因为单词不是以简单的从左到右的方式来解释的,而是通过将它们组合成短语以及将短语组合成句子来解释的。图 5.8显示了这个句子的句法树图,指定了单词之间的层次结构。
One of the principal features of linguistic syntax is that relationships between words are not simply based on nearest neighbor relations. For example, consider the sentence, “The girl who kissed the boy opened the door.” Although the sentence contains the sequence of words “the boy opened the door,” a speaker of English knows that the boy did not do the opening. This is because words are not interpreted in a simple left-to-right fashion, but via their combination into phrases and the combination of phrases into sentences. Figure 5.8 shows a syntactic tree diagram for this sentence specifying the hierarchical organization of words in relation to each other.
与语言一样,音调音乐中的结构关系不仅仅基于邻接性。相反,事件以分层方式组织。这些层次的结构是现代认知导向音乐理论的主要焦点,如 Lerdahl 和 Jackendoff 的调性音乐生成理论 (GTTM) 所示。图 5.9显示了根据 GTTM 的音乐段落中音调的层次结构。这个句法树的细节将在下一节中解释。目前,需要提出两个概念要点。首先,这种层次结构是一种“事件层次结构”,它描述了特定音乐序列中的结构关系。这必须与前几节中描述的“音高等级”有明显区别,后者涉及音调音乐风格的整体、非时间方面,例如,音阶 1 是音阶中结构最稳定的音调(Bharucha,1984a) ). 音高层次结构只是影响事件层次结构构建的因素之一。第二点是音乐理论为音乐序列设定了两种类型的事件层次结构,描述音乐事件之间不同类型的结构关系(Lerdahl & Jackendoff, 1983; cf. Jackendoff & Lerdahl, 2006)。这些将在下面依次讨论。
As with language, structural relations in tonal music are not merely based on adjacency. Instead, events are organized in a hierarchical fashion. The structure of these hierarchies is a major focus of modern cognitively oriented music theory, as exemplified by Lerdahl and Jackendoff’s generative theory of tonal music (GTTM). Figure 5.9 shows a hierarchical structure for the tones in a musical passage according to GTTM. The details of this syntactic tree will be explained in the next section. For now, two conceptual points need to be made. First, this type of hierarchy is an “event hierarchy,” which describes structural relations in a particular sequence of music. This must be clearly distinguished from the “pitch hierarchies” described in previous sections, which concern overall, atemporal aspects of the tonal musical style, for example, the fact that scale degree 1 is the most structurally stable tone in the scale (Bharucha, 1984a). Pitch hierarchies are only one factor that influences the construction of event hierarchies. The second point is that music theory posits two types of event hierarchies for musical sequences, describing different kinds of structural relations between musical events (Lerdahl & Jackendoff, 1983; cf. Jackendoff & Lerdahl, 2006). These will be discussed in turn below.
图 5.8英语句子的层级句法结构。(S = 句子;NP = 名词短语,VP = 动词短语,S' = 句子修饰语 [关系从句],N = 名词;V = 动词;Det = 限定词;Rel-Pro = 关系代词。)在从句中,关系代词“who”被称为填充词,并被解释为动词“kissed”的演员。这种关系通过在关系从句的主语位置中存在一个联合索引的空元素 e i来识别。修改自 Patel,2003b。
Figure 5.8 The hierarchical syntactic structure of an English sentence. (S = sentence; NP = noun phrase, VP = verb phrase, S′ = sentence modifier [relative clause], N = noun; V = verb; Det = determiner; Rel-Pro = relative pronoun.) Within the clause, the relative pronoun “who” is referred to as a filler and is interpreted as the actor for the verb “kissed.” This relationship is identified by the presence of a coindexed empty element ei in the subject position of the relative clause. Modified from Patel, 2003b.
一些音高用来修饰或装饰其他音高的概念是西欧音乐理论的核心(例如,参见 Schenker 的理论,1969 年;Meyer,1973 年;以及 Lerdahl 和 Jackendoff,1983 年;cf. Cook,1987a)。在装饰华丽的爵士乐版本中识别熟悉的曲调的能力,或者更一般地说,将一个段落听成另一个段落的精心制作版本的能力,意味着并非音乐序列中的所有事件都被视为同等重要。相反,有些事件被认为比其他事件更重要。请注意,将某些球场称为“装饰性”球场并不意味着它们在美学上比其他球场更重要或更不重要。相反,这种区别是为了捕捉这样一个事实,即并非所有音高在形成音乐序列的精神要点时都是平等的。第 4 章,第 4.2.7 节)。
The concept that some pitches serve to elaborate or ornament others is central to Western European music theory (see, e.g., the theories of Schenker, 1969; Meyer, 1973; and Lerdahl & Jackendoff, 1983; cf. Cook, 1987a). The ability to recognize a familiar tune in a richly ornamented jazz version, or more generally, the ability to hear one passage as an elaborated version of another, implies that not all events in a musical sequence are perceived as equally important. Instead, some events are heard as more important than others. Note that calling some pitches “ornamental” is not meant to imply that they are aesthetically more or less important than other pitches. Rather, the distinction is meant to capture the fact that not all pitches are equal in forming the mental gist of a musical sequence. It is also worth noting that the concept of melodic elaboration is not unique to Western European tonal music (cf. Chapter 4, section 4.2.7).
Lerdahl 和 Jackendoff 的“时间跨度缩减”理论是对事件层次结构和装饰的一种特别清晰的处理。图 5.9a中的树是音乐旋律的两个乐句的时间跨度缩减,显示了该段落中音调的结构重要性等级。较短的分支终止于不太重要的音高,而较长的分支终止于更重要的音高。4个
One particularly clear treatment of event hierarchies of structure and ornamentation is Lerdahl and Jackendoff’s theory of “time-span reduction.” The tree in Figure 5.9a is a time-span reduction of two phrases of a musical melody, showing the hierarchy of structural importance for the tones in this passage. Shorter branches terminate on less important pitches, whereas longer branches terminate on more important pitches.4
建造这样一棵树需要决定哪些球场比其他球场更具结构性;这些决定受到音调等级的影响,但也会考虑节奏和动机信息。使用树结构来表示结构与装饰(而不是简单的二元方案,即每个音高要么是结构性的,要么是装饰性的)是基于音乐被组织成结构层次的假设,因此一个音高在一个层次上是结构性的可能在更深层次上具有观赏性。因此,在任何特定高度截取树的横截面,留下该级别的主要事件(参见图 5.9b)。因此,这些树旨在模拟有经验的听众对音调的相对结构重要性水平的直觉。
Construction of such a tree requires decisions about which pitches are more structural than others; these decisions are influenced by tonal hierarchies, but also take rhythmic and motivic information into account. The use of a tree structure to indicate structure versus ornament (rather than a simple binary scheme whereby each pitch is either structural or ornamental) is based on the hypothesis that music is organized into structural levels, so that a pitch that is structural at one level may be ornamental at a deeper level. Thus taking a cross section of the tree at any particular height leaves one with the dominant events at that level (cf. Figure 5.9b). The trees are thus meant to model an experienced listener’s intuitions about levels of relative structural importance of tones.
尽管结构与装饰的概念在声调句法理论中根深蒂固,但对这个问题的实证研究相对较少。Large 等人进行了一项重要研究。(1995),他检查了钢琴家对儿童旋律的即兴变奏,如图 5.9所示. 钢琴家们首先根据乐谱演奏一段旋律,然后根据同一旋律即兴创作五首简单的曲子。大等。推断一个音高的结构重要性将反映在它在不同变体中保存在相同相对位置的次数上。与这个想法一致,作者发现不同音高在即兴创作中保留的程度存在很大差异,这表明钢琴家对旋律的结构骨架有一个概念。音高保留的模式在很大程度上可以根据不同音阶的音高等级结合给定音符的音符持续时间和韵律重音程度来解释(所有这些都包含在 Lerdahl 和 Jackendoff 的时间跨度缩减中) . 因此在这些旋律中,
Although the notion of structure versus ornament is deeply ingrained in theories of tonal syntax, empirical studies of this issue have been relatively rare. One important study was conducted by Large et al. (1995), who examined pianist’s improvised variations on children’s melodies, such as the one shown in Figure 5.9. The pianists began by playing a melody from notation, and then produced five simple improvisations on the same melody. Large et al. reasoned that the structural importance of a pitch would be reflected in the number of times it was preserved in the same relative location across the variations. Consistent with this idea, the authors found substantial variation in the extent to which different pitches were retained across improvisations, suggesting that the pianists had a notion of the structural skeleton of the melody. The pattern of pitch retention could be accounted for largely on the basis of the pitch hierarchy of different scale degrees in combination with note duration and degree of metrical accent on a given note (all of which are incorporated in Lerdahl and Jackendoff’s time-span reduction). Thus in these melodies, a typical “elaboration” pitch was one that occurred on a scale degree of low tonal stability, was of short duration, and was not aligned with an accented beat.
Large 等人的研究提供了对音乐表演中精雕细琢的见解,但没有解决感知问题听众的阐述关系。理想情况下,人们希望衡量序列中每个音高的感知结构重要性,从而产生可以以定量方式分析的逐音配置文件。然而,这种测量在实践中很难进行,并且在该领域需要创造性的方法。Bharucha (1984b) 的一项相关研究表明,音调不稳定的音符的显着性受其相对于下一个音符的连续位置的影响。具体来说,Bharucha 证明了一个不稳定的音符紧接着一个音调稳定的音高相邻音符(例如,C 大调上下文中的 BC)比没有以这种方式“锚定”的不稳定音符更不突出/可检测到。就好像稳定的声调作为局部装饰从属于前一个声调,并且使它不像将相同的声调随机插入序列中那样显眼。这表明音调等级与音乐中感知的详尽关系有关。然而,在这一领域显然还有更多的工作空间,旨在生成一个音符的度量标准,以衡量音乐序列中事件的感知结构重要性。
Large et al.’s study provides insights on elaboration in music performance, but leaves open the question of the perception of elaboration relations by listeners. Ideally, one would like a measure of the perceived structural importance of each pitch in a sequence, resulting in a tone-by-tone profile that could be analyzed in a quantitative fashion. Such measurements are difficult to make in practice, however, and creative approaches are needed in this area. One relevant study is that of Bharucha (1984b), who used a memory experiment to show that the salience of a tonally unstable note was influenced by its serial position relative to the following note. Specifically, Bharucha demonstrated that an unstable note that is immediately followed by a tonally stable pitch neighbor (e.g., B-C in a C-major context) is less prominent/detectable than an unstable note that is not “anchored” in this way. It is as if the stable tone subordinates the preceding tone as a local ornament and makes it less conspicuous than if that same tone were inserted randomly into the sequence. This suggests that the tonal hierarchy is involved in perceived elaboration relations in music. There is clearly room for more work in this area, however, aimed at generating a note-by-note metric of the perceived structural importance of events in musical sequences.
图 5.9 (A) 儿童歌曲“安静的小宝贝”前两个短语的时间跨度缩减。较短的分支终止于不太重要的音高,而较长的分支终止于更重要的音高。(B) 较低的五线谱显示树结构连续较高级别的主要事件。修改自 Large 等人,1995 年。
Figure 5.9 (A) A time-span reduction of the first two phrases of the children’s song “Hush Little Baby.” Shorter branches terminate on less important pitches, whereas longer branches terminate on more important pitches. (B) The lower staves show the dominant events at successively higher levels of tree structure. Modified from Large et al., 1995.
调性音乐体验的核心是听众在乐曲随时间展开时的紧张感和决心感(Swain,1997)。勒达尔和杰肯道夫(1983) 将音乐的这一方面称为“音乐的不断呼吸”。. . [其感知] 是音乐理解的核心”(第 123、179 页)。紧张的概念与流动性或开放感(即音乐必须继续的感觉)有关,而解决与休息或休息有关。虽然张力可以通过音乐的表面特征(如响度和速度)来传达,但调性音乐中张力的一个非常重要的组成部分是乐曲的和声结构,换句话说,它的和弦和键的基本顺序。这些有助于“音调张力”的模式,这种模式是由结构化认知空间中的谐波元素之间的关系引起的。
Central to the experience of tonal music is a listener’s sense of tension and resolution as a piece unfolds in time (Swain, 1997). Lerdahl and Jackendoff (1983) refer to this aspect of music as “the incessant breathing in and out of music . . . [whose perception] is at the very heart of musical understanding” (pp. 123, 179). The notion of tension is related to the sense of mobility or openness (i.e., a sense that the music must continue), whereas resolution is associated with repose or rest. Although tension can be conveyed by surface features of music such as loudness and tempo, a very important component of tension in tonal music is the harmonic structure of a piece, in other words, its underlying sequence of chords and keys. These contribute to the pattern of “tonal tension” that arises from relations between harmonic elements in a structured cognitive space.
Lerdahl 和 Jackendoff 将他们的 GTTM 的主要部分用于描述紧张和放松,并提出紧张是以等级方式组织的。也就是说,他们试图捕捉局部紧张和放松运动嵌入到更大规模运动中的直觉。他们开发的代表紧张和放松模式的形式主义是一种树状结构,他们称之为“延长减少”。图 5.10给出了延长减少的示例(参见声音示例 5.1)。
Lerdahl and Jackendoff devote a major component of their GTTM to the description of tension and relaxation, and propose that tension is organized in a hierarchical fashion. That is, they seek to capture the intuition that local tensing and relaxing motions are embedded in larger scale ones. The formalism they develop to represent the patterning of tension and relaxation is a tree-like structure that they refer to as a “prolongation reduction.” An example of a prolongation reduction is given in Figure 5.10 (cf. Sound Example 5.1).
在这种类型的树中,右分支表示张力增加,左分支表示张力减小(即松弛)。因此在图 5.10中,树表明第一和弦局部放松到第二和弦,而第二和弦局部紧张到第三和弦。第四和弦(乐句中的最大张力点)是第一个源自右枝的第一个事件,它连接在树的高处,代表更大层次上张力的增加。在这个和弦之后,局部松弛到和弦 5 和 6 之后是局部紧张运动,然后是更全面的松弛,由连接和弦 6 和最终和弦的左分支表示。请注意,如图 5.10 所示的树结构依赖于时间跨度缩减,但不是由它决定的:这两种树可以以不同的方式组织音乐表面(详见 Lerdahl & Jackendoff,1983,Chs. 5-9)。因此,延长减少是另一种事件层次结构,它以比简单的最近邻关系更复杂的方式将事件(通常是和弦)相互关联起来。
In this type of tree, right branching indicates an increase in tension and left branching a decrease in tension (i.e., a relaxation). Thus in Figure 5.10, the tree indicates that the first chord locally relaxes into the second, whereas the second chord locally tenses into the third. The fourth chord (the point of maximum tension in the phrase) is the first event originating from a right branch that attaches high up in the tree, and represents an increase in tension at a larger level. Following this chord, local relaxations into chords 5 and 6 are followed by local tensing movements before a more global relaxation, indicated by the left branch connecting chord 6 and the final chord. Note that the construction of trees such as that of Figure 5.10 relies on time-span reduction but is not determined by it: The two kinds of trees can organize the musical surface in different ways (see Lerdahl & Jackendoff, 1983, Chs. 5-9 for details). Thus prolongation reduction is another kind of event hierarchy that relates events (typically chords) to each other in ways that are more complex than simple nearest-neighbor relations.
图 5.10对 JS Bach (Christus, der ist mein Leben)作品中的一个乐句的延长缩减。在这种类型的树中,右分支表示张力增加,左分支表示张力减小(即松弛)。这棵树显示了局部的紧张和放松运动是如何嵌入到更大规模的运动中的。修改自 Lerdahl,2001:32。
Figure 5.10 A prolongation reduction of a phrase from a composition by J. S. Bach (Christus, der ist mein Leben). In this type of tree, right branching indicates an increase in tension and left branching a decrease in tension (i.e., a relaxation). The tree shows how local tensing and relaxing motions are embedded in larger scale ones. Modified from Lerdahl, 2001:32.
有证据表明,紧张实际上是以等级方式感知的,这些研究来自于听众在听音乐段落时对感知到的紧张进行评分 (Krumhansl, 1996)。然后可以将这种凭经验测量的“张力曲线”与基于计算音高、和弦和键之间的心理距离的音调张力数值模型的预测进行比较。第 5.4.3 节讨论了这样一种模型(关于音高空间理论的小节)。音调空间 (TPS) 模型可以根据张力-松弛关系的分层分析与纯顺序分析生成预测,从而可以确定哪种类型的结构更适合经验数据。这一领域的研究产生了明显矛盾的证据,一些研究支持感知到的紧张-放松模式的纯粹顺序结构,而其他研究则支持层次结构。仔细检查这些研究表明,这些差异可能源于用于收集张力等级的不同范例。例如,在 Bigand 和 Parncutt (1999) 的研究中,听众听到越来越长的和弦片段,并在每个片段的末尾进行张力评级。这种“停止紧张”任务具有时间精确性的优势,但存在不自然的聆听情况,可能会鼓励局部而不是全局聆听。事实上,Bigand 和 Parncutt 发现他们的听众的紧张状况使用局部谐波结构(尤其是韵律)得到了很好的建模,而层次结构的贡献可以忽略不计。相比之下,使用“持续紧张”任务的研究,即听众在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003) . 因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。但遭受不自然的聆听环境的困扰,这可能会鼓励本地而不是全球聆听。事实上,Bigand 和 Parncutt 发现他们的听众的紧张状况使用局部谐波结构(尤其是韵律)得到了很好的建模,而层次结构的贡献可以忽略不计。相比之下,使用“持续紧张”任务的研究,即听众在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003) . 因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。但遭受不自然的聆听环境的困扰,这可能会鼓励本地而不是全球聆听。事实上,Bigand 和 Parncutt 发现他们的听众的紧张状况使用局部谐波结构(尤其是韵律)得到了很好的建模,而层次结构的贡献可以忽略不计。相比之下,使用“持续紧张”任务的研究,即听众在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003) . 因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。Bigand 和 Parncutt 发现,他们的听众的紧张状况可以很好地使用局部谐波结构(尤其是韵律)建模,而层次结构的贡献可以忽略不计。相比之下,使用“持续紧张”任务的研究,即听众在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003) . 因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。Bigand 和 Parncutt 发现,他们的听众的紧张状况可以很好地使用局部谐波结构(尤其是韵律)建模,而层次结构的贡献可以忽略不计。相比之下,使用“持续紧张”任务的研究,即听众在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003) . 因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。听者在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003)。因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。听者在聆听正在进行的曲目时移动滑块,为感知紧张模式中的层次结构提供了证据(Lerdahl & Krumhansl,2007;Smith & Cuddy,2003)。因此,有证据表明音乐的紧张和放松实际上是以等级方式组织的,尽管在这方面还需要做更多的工作。5个
Evidence that tension is actually perceived in a hierarchical fashion comes from studies in which listeners rate perceived tension as they listen to musical passages (Krumhansl, 1996). Such empirically measured “tension profiles” can then be compared to predictions based on numerical models of tonal tension based on computing psychological distances between pitches, chords, and keys. One such model is discussed in section 5.4.3 (subsection on tonal pitch space theory). The tonal pitch space (TPS) model can generate predictions based on hierarchical versus purely sequential analyses of tension-relaxation relationships, so that one can determine which type of structure better fits the empirical data. Research in this area has produced apparently contradictory evidence, with some studies favoring a purely sequential structure of perceived tension-relaxation patterns, whereas other studies support a hierarchical structure. Closer examination of these studies suggests these differences may arise from different paradigms used to collect tension ratings. For example, in Bigand and Parncutt’s (1999) study, listeners heard increasingly long fragments of chord sequences and made tension ratings at the end of each fragment. This “stop-tension” task has the advantage of temporal precision, but suffers from an unnatural listening situation that may encourage local rather than global listening. Indeed, Bigand and Parncutt found that the tension profiles of their listeners were well modeled using local harmonic structure (especially cadences), with a negligible contribution of hierarchical structure. In contrast, research using a “continuous-tension” task, in which a listener moves a slider while listening to an ongoing piece, has provided evidence for hierarchical structure in perceived tension patterns (Lerdahl & Krumhansl, 2007; Smith & Cuddy, 2003). Thus there is evidence that musical tension and relaxation is in fact organized in a hierarchical fashion, though more work is needed in this area.5
正如本章前面所述,语言中的句法和意义之间存在着密切的联系:改变元素的顺序可能会导致序列具有截然不同的含义。这与鸟类歌曲和鲸鱼歌曲等其他脊椎动物通信系统形成对比,在这些系统中,没有证据表明元素顺序与消息含义之间存在丰富的关系 (Marler, 2000)。转向人类音乐,如果将音乐中的张力和分辨率模式视为一种音乐意义,那么很明显,改变音乐元素的顺序(例如,重新排列和弦序列)将通过其意义对意义产生强烈影响对紧张-放松模式的影响。当然,关于与语言相关的音乐意义还有很多要说的(参见第 6 章)). 这里的关键点是,音乐句法与语言句法一样,表现出很强的结构-意义联系。
As noted earlier in this chapter, there is a strong link between syntax and meaning in language: Changing the order of elements can result in a sequence with very different meaning. This stands in contrast to other vertebrate communication systems such as bird song and whale song, in which there is no evidence for a rich relation between the order of elements and the meaning of the message (Marler, 2000). Turning to human music, if the pattern of tension and resolution in music is taken as one kind of musical meaning, then it is clear that changing the order of musical elements (e.g., rearranging chord sequences) will have a strong impact on meaning via its influence on tension-relaxation patterns. Of course, there is much more to say about musical meaning in relation to language (cf. Chapter 6). The key point here is that musical syntax, like linguistic syntax, exhibits a strong structure-meaning link.
从上面的讨论中可以明显看出,调性音乐具有丰富的句法,但一个重要的问题尚未得到解决。这种语法在多大程度上反映了声音之间的抽象认知关系,而不是心理声学关系?换句话说,调性音乐中的句法关系仅仅是声音的心理声学事实的自然结果,例如泛音系列和音调之间某些频率间隔的平滑度或粗糙度吗?(参见第 2 章对于泛音系列和不同频率间隔的感官质量的回顾。)如果是这样,这将意味着音调句法缺乏语言句法的抽象性,因为元素之间的心理关系反映了声音的物理特性,而不是纯粹的常规结构关系(参见 Bigand 等人,2006 年)。
It is evident from the discussion above that tonal music has a rich syntax, but one important issue has not yet been addressed. To what extent does this syntax reflect abstract cognitive relationships between sounds, versus psychoacoustic relationships? Put another way, are syntactic relationships in tonal music merely a natural outcome of the psychoacoustic facts of sound, such as the overtone series and the smoothness or roughness of certain frequency intervals between tones? (Cf. Chapter 2 for a review of the overtone series and the sensory qualities of different frequency intervals.) If so, this would imply that tonal syntax lacks the abstractness of linguistic syntax, because psychological relations between elements reflect physical properties of sounds rather than purely conventional structural relations (cf. Bigand et al., 2006).
寻找声调语法的物理基础可以追溯到理论家 Rameau (1722),并且从那时起就得到了强烈的支持。它的吸引力是可以理解的,因为它将音乐感知与更基本的声音感知联系起来。这一观点在上个世纪的倡导者包括伯恩斯坦 (Bernstein, 1976),他认为泛音级数为音阶和音乐中调性中心的存在提供了基础。最近,Parncutt (1989) 对和声进行了复杂的定量分析,试图根据心理声学特性来理解调性音乐中和弦之间的结构关系(另见 Huron & Parncutt,1993;Leman,1995;Parncutt & Bregman, 2000)。
The search for a physical basis for tonal syntax dates back to the theorist Rameau (1722), and has had strong advocates ever since. Its appeal is understandable, in that it links music perception to a more basic level of sound perception. The advocates of this view in the last century include Bernstein (1976), who argued that the overtone series provides the foundation for musical scales and for the existence of a tonal center in music. More recently, Parncutt (1989) has provided a sophisticated quantitative analyses of harmony that seeks to understand structural relations between chords in tonal music on the basis of their psychoacoustic properties (see also Huron & Parncutt, 1993; Leman, 1995; Parncutt & Bregman, 2000).
音乐句法的“物理主义”和“认知主义”观点之间的争论已经存在了一段时间。例如,八度音阶和五度音阶在音乐系统中的文化广泛重要性可能是由于普遍的声学/听觉机制(参见第 2 章)。另一方面,在对 Bernstein 的The Unanswered Question 的评论中, Jackendoff (1977) 指出自然泛音系列与音阶细节之间的匹配(特别是pentatonic scales) 实际上并没有那么好,在某些文化中,音阶与泛音系列关系不大,但其音乐仍然具有调性中心。回到物理主义方面,Leman (2000) 已经表明,Krumhansl 的一些探测音等级可以根据特定的听觉短期记忆模型来解释。回到认知主义方面,关于和弦关系感知的研究直接将基于心理声学的预测与基于传统和声关系的预测进行了对比,并为后者找到了证据(Tekman & Bharucha,1998 年;Bigand 等人,2003 年)。毫无疑问,这场辩论将继续下去。目前,有证据表明西欧音乐的调性并不是心理声学的简单副产品,但也不是它的公然矛盾。也就是说,音乐声音的心理声学特性似乎为音调句法提供了“必要但不充分”的基础(Lerdahl,2001 年;参见 Koelsch 等人,2007 年的相关神经数据)。
Debates between “physicalist” and “cognitivist” views of musical syntax have existed for some time. For example, the culturally widespread importance of the octave and fifth in musical systems is likely due to universal acoustic/ auditory mechanisms (cf. Chapter 2). On the other hand, in a review of Bernstein’s The Unanswered Question, Jackendoff (1977) points out that the match between the natural overtone series and the details of musical scales (particularly pentatonic scales) is actually not that good, and there are cultures in which musical scales have little relationship to the overtone series, yet whose music still has a tonal center. Back on the physicalist side, Leman (2000) has shown that some of Krumhansl’s probe-tone ratings can be accounted for on the basis of a particular model of auditory short-term memory. Back on the cognitivist side, research on the perception of chord relations that has directly pitted predictions based on psychoacoustics against those based on conventional harmonic relations has found evidence for the latter (Tekman & Bharucha, 1998; Bigand et al., 2003). There is little doubt that this debate will continue. At the current time, the evidence suggests that Western European musical tonality is not a simple byproduct of psychoacoustics, but is not a blatant contradiction of it either. That is, the psychoacoustic properties of musical sounds appear to provide a “necessary but insufficient” basis for tonal syntax (Lerdahl, 2001; cf. Koelsch et al., 2007, for relevant neural data).
声调句法的认知主义观点的一个有力证据是,音乐元素的某些心理特性源于它们的背景和结构关系,而不是源于它们的内在物理特征。其中一个属性是不同和弦的“和声功能”。和弦的和声功能是指它在特定音调中所起的结构作用。例如,考虑两个和弦 GBD 和 CEG。在C大调中,这些和弦分别起到V和弦和I和弦的作用,而在G大调中,它们分别起到I和弦和IV和弦的作用。声音示例 5.2a 和 5.2b 说明了这些不同功能所传达的不同感觉,其中这两个和弦分别构成了 C 大调和 G 大调序列的最终和弦。在前一种情况下,这两个和弦形成了真实的节奏 (VI),并使乐句成为音乐的结尾。在第二种情况下,相同的两个和弦(因此在物理上相同的声波)作为 I-IV 进行,并使乐句听起来未完成。大量研究表明,未经音乐训练的听众对这种句法差异很敏感。例如,他们在判断最终和弦作为 I 和弦时比作为 IV 和弦时是否误调时反应更快、更准确 (Bigand & Pineau, 1997; Tillmann et al., 1998),并表现出不同当大脑以这两种不同的方式发挥作用时,大脑会对同一个和弦做出反应(Regnault 等人,2001 年;参见 Poulin-Charronnat 等人,2006 年)。Bigand 等人。(2001) 使用这种“谐波启动”实验表明,和弦功能之间的差异甚至会影响语言过程。他们构建了序列,其中和弦进行以四部分和声演唱,每个和弦对应一个不同的无意义音节。听众的任务只是指出最后一个音节是否包含音素 /i/ 或 /u/。当最终和弦作为 I 和弦而不是 IV 和弦时,音乐家和非音乐家的正确反应都更快。
One strong piece of evidence for a cognitivist view of tonal syntax is that certain psychological properties of musical elements derive from their context and structural relations rather than from their intrinsic physical features. One such a property is the “harmonic function” of different chords. The harmonic function of a chord refers to the structural role it plays in a particular key. For example, consider the two chords G-B-D and C-E-G. In the key of C major, these chords play the role of a V chord and a I chord, respectively, whereas in the key of G major, they play the role of a I chord and a IV chord. The different feeling conveyed by these different functions is illustrated by Sound Examples 5.2a and 5.2b, in which these two chords form the final chords of sequences in C major and G major, respectively. In the former case, the two chords form an authentic cadence (V-I), and bring the phrase to a musical conclusion. In the second case, the same two chords (and thus physically identical sound waves) act as a I-IV progression and leave the phrase sounding unfinished. Numerous studies have shown that musically untrained listeners are sensitive to this syntactic difference. For example, they react more quickly and accurately in judging whether the final chord is mistuned when it functions as a I chord than when it functions as a IV chord (Bigand & Pineau, 1997; Tillmann et al., 1998), and show different brain responses to the same chord when it functions in these two distinct ways (Regnault et al., 2001; cf. Poulin-Charronnat et al., 2006). Bigand et al. (2001) have used such “harmonic priming” experiments to show that the difference between chord functions even influences linguistic processes. They constructed sequences in which chord progressions were sung in four-part harmony, with each chord corresponding to a different nonsense syllable. The listeners’ task was simply to indicate if the final syllable contained the phoneme /i/ or /u/. The correct responses of both musicians and nonmusicians were faster when the final chord functioned as a I chord rather than as a IV chord.
关于和声功能更广泛的抽象层次的证据来自于对孤立与音乐中的和弦感知进行检验的研究上下文。这项工作表明,如果物理上不同的和弦具有相似的和声功能(在为后续和弦提供相似预测的意义上),则很难在音乐环境中区分它们。具体来说,已经表明,听众可以很容易地单独区分 IV 和弦和 II 6和弦。6然而,同样的听众发现很难将 I-IV-V 和弦序列与 I-II 6 -V 序列区分开来 (Hutchins, 2003)。这可能是因为 IV 或 II 6和弦在调性音乐中以类似的方式发挥作用是很常见的,换句话说,作为 V 和弦的准备。(理论家指的是 IV 或 II 6发挥的功能和弦在此上下文中称为“次属功能”,以 IV 和弦命名,它是实现此功能的原型和弦;比照。Lerdahl & Jackendoff, 1983:192.)这项研究中的一个关键控制条件表明,听众可以在孤立和音乐背景(IVI 与 I-III 6 -I)中区分 V 和 III 6和弦,尽管这些和弦之间的区别类似于 IV 和 II 6和弦之间的区别。重要的是,就预测后续和弦而言,V 和 III 6和弦在音调音乐中的功能并不相似。(有关音调功能对旋律感知的影响的类似证据,请参见 Cuddy & Lyons,1981。)
Evidence for an even broader level of abstraction regarding harmonic function comes from research examining chord perception in isolation versus in musical contexts. This work suggests that physically different chords are difficult to tell apart in a musical context if they have a similar harmonic function (in the sense of providing a similar prediction for a subsequent chord). Specifically, it has been shown that listeners can easily distinguish a IV chord from a II6 chord in isolation.6 Yet the same listeners find it very difficult to distinguish a I-IV-V chord sequence from a I-II6-V sequence (Hutchins, 2003). This is likely due to the fact that it is quite common for the IV or II6 chord to function in a similar way in tonal music, in other words, as a preparation for a V chord. (Theorists refer to the function played by the IV or II6 chord in this context as the “subdominant function,” named after the IV chord, which is the prototypical chord fulfilling this function; cf. Lerdahl & Jackendoff, 1983:192.) A crucial control condition in this study demonstrated that listeners could distinguish a V from a III6 chord both in isolation and in a musical context (I-V-I vs. I-III6-I), though the difference between these chords is akin to that between the IV and II6 chords. Importantly, the V and III6 chords are not functionally similar in tonal music in terms of predicting a following chord. (For similar evidence on the influence of tonal functions in melody perception, see Cuddy & Lyons, 1981.)
从前面的讨论中可以清楚地看出,音乐句法在结构上很复杂,这表明它可以与语言句法进行有意义的比较。在开始这样的比较之前,有必要就音乐句法提出最后三点。
From the preceding discussion, it is clear that musical syntax is structurally complex, suggesting that it can sustain a meaningful comparison with linguistic syntax. Before embarking on such a comparison, it is worth making three final points about musical syntax.
首先,语法允许音乐基于对比而不是相似性来实现感知连贯性。音乐感知的一个决定性特征是听到彼此之间存在重要关系的声音,而不是作为一系列孤立事件的连续声音 (Sloboda, 1985)。音乐中使用的最普遍的关系类型是相似性,例如通过旋律短语的重复或变化产生的相似性,或者通过主题与文化中已知的音乐主题的相似性(参见 Gjerdingen,2007 年) ). 句法组织通过创建基于对比的连贯模式来补充基于相似性的连贯性。例如,和弦进行涉及不同和弦的顺序使用,但这些差异阐明了一个连贯的旅程,例如从静止到紧张再回到静止。音乐句法还通过层次对比创造连贯性。例如,通过听到结构音调和装饰音调之间的差异,听众可以提取音乐序列的要点并识别其与同一音乐习语中其他段落的相似性(或对比)。正如后一个例子所表明的,音调音乐的丰富性很大程度上来自于基于对比和基于相似性的认知过程在形成感知连贯模式时相互作用的方式。
First, syntax allows music to achieve perceptual coherence based on contrast rather than on similarity. A defining feature of music perception is hearing sounds in significant relation to one another rather than as a succession of isolated events (Sloboda, 1985). The most pervasive type of relationship used in music is similarity, such as the similarity created via the repetition or variation of a melodic phrase, or via the similarity of a theme to a stock of musical themes known in a culture (cf. Gjerdingen, 2007). Syntactic organization complements similarity-based coherence by creating coherent patterns based on contrast. For example, a chord progression involves the sequential use of different chords, yet these differences articulate a coherent journey, such as from repose to tension and back to repose. Musical syntax also creates coherence via hierarchical contrasts. For example, by hearing the difference between structural and ornamental tones, a listener can extract the gist of a musical sequence and recognize its similarity to (or contrast with) other passages in the same musical idiom. As suggested by this latter example, much of the richness of tonal music comes from the way that contrast-based and similarity-based cognitive processes interact in forming perceptually coherent patterns.
其次,虽然上面已经独立地描述了音乐的不同级别的句法组织(例如,音阶、和弦和键),但级别之间存在大量交互。例如,对于有经验的听众来说,即使是一个或两个和弦也可以暗示一个调,单音的旋律可以暗示潜在的和弦结构。指定层间交互的精确性质一直是各种音调感知模型的焦点。例如,在 Bharucha (1987) 提出的一个模型中,音调、和弦和键被表示为人工神经网络不同层中的节点,并且一个级别的活动可以根据层间连接模式传播到其他级别(参见 Tillmann 等人,2000)。该模型已经能够解释和声启动实验的结果,这些实验表明和弦的处理如何受到其与前一个和弦的和声关系的影响(Bigand 等人,1999)。另一个解决句法层次之间关系的人工神经网络模型涉及一个自组织映射,该映射用于模拟当听到一系列和弦时键感如何随着时间的推移而发展和变化(Toiviainen & Krumhansl,2003)。
Second, although the different levels of syntactic organization of music have been described independently above (e.g., scales, chords, and keys), there is a good deal of interaction between levels. For example, for an experienced listener, even one or two chords can suggest a key, and a melody of single tones can suggest an underlying chord structure. Specifying the precise nature of interlevel interactions has been the focus of various models of tonal perception. For example, in one model suggested by Bharucha (1987), tones, chords, and keys are represented as nodes in different layers of an artificial neural network, and activity at one level can propagate to other levels according to the pattern of interlevel connectivity (cf. Tillmann et al., 2000). This model has been able to account for the results of harmonic priming experiments that show how a chord’s processing is influenced by its harmonic relation to preceding chords (Bigand et al., 1999). Another artificial neural network model that addresses the relations between syntactic levels involves a self-organizing map that is used to model how a sense of key develops and shifts over time as a sequence of musical chords is heard (Toiviainen & Krumhansl, 2003).
第三,关于声调句法的一个非常重要的问题是它在发展过程中的习得与在感知过程中的应用之间的关系。统计研究表明,音阶音和和弦之间的音高层次反映了调性音乐中不同音符和和弦的相对频率(Krumhansl,1990)。因此有理由相信,音调句法的习得反映了特定音乐环境的统计数据。然而,一旦获得了音乐句法,它就可以被本身不符合帮助形成句法知识的全球统计数据的模式激活。例如,Bigand 等人。(2003) 发现,当一个音序作为 I 和弦与 IV 和弦时,听众对音序的最后和弦的误调反应更快、更准确(cf.上文第 5.2.3 节),即使前面的和弦具有比 I 和弦更多的 IV 和弦实例。因此,认知因素,即 I 和弦在音调中的结构中心性,在影响感知和行为方面优于 IV 和弦的频率。这为影响我们如何聆听音乐世界的句法知识系统提供了强有力的证据。
Third, a very important issue about tonal syntax concerns the relationship between its acquisition during development and its application during perception. Statistical research suggests that pitch hierarchies among scale tones and chords reflect the relative frequency of different notes and chords in tonal music (Krumhansl, 1990). Thus there is reason to believe that the acquisition of tonal syntax reflects the statistics of a particular musical environment. However, once a musical syntax is acquired, it can be activated by patterns that do not themselves conform to the global statistics that helped form the syntactic knowledge. For example, Bigand et al. (2003) found that listeners reacted more quickly and accurately to mistunings of the final chord of a sequence when it functioned as a I chord versus a IV chord (cf. section 5.2.3 above), even when the preceding chords had more instances of IV chords than I chords. Thus cognitive factors, namely the structural centrality of the I chord in a musical key, prevailed over the frequency of the IV chord in influencing perception and behavior. This provides strong evidence for a syntactic knowledge system that influences how we hear our musical world.
前面的部分提供了以正式术语比较语言和音乐句法的背景。这样的比较是有意义的,因为语言和音乐是丰富的句法系统,它们不是彼此的微不足道的变体。鉴于对形式差异的清晰理解,系统之间的形式相似性得到最好的理解,我首先谈到这一点。
The preceding sections provide the background for comparing linguistic and musical syntax in formal terms. Such a comparison is meaningful because language and music are rich syntactic systems that are not trivial variants of one another. Formal similarities between the systems are best appreciated in light of a clear understanding of formal differences, to which I turn first.
也许这两个领域中句法之间最明显的区别是语言中存在语法类别,例如名词、动词和形容词,这些在音乐中没有对应物。为此类类别寻找音乐类似物的尝试是 Bernstein 在未回答的问题中掉入的一个陷阱,并且是 Lerdahl 和 Jackendoff 正确标记为“一个古老且基本上无用的游戏”的一部分。另一个独特的语言句法实体的例子是单词在句子中的语言语法功能,换句话说,主语、直接宾语和间接宾语 (Jackendoff, 2002)。寻找这些功能的直接音乐等价物是一项误入歧途的事业。
Perhaps the most obvious difference between syntax in the two domains is the presence of grammatical categories in language, such as nouns, verbs, and adjectives, that have no counterparts in music. The attempt to find musical analogs for such categories is a trap that Bernstein fell into in The Unanswered Question, and is part of what Lerdahl and Jackendoff correctly labeled as “an old and largely futile game.” Another example of uniquely linguistic syntactic entities are the linguistic grammatical functions that words play in sentences, in other words, subject, direct object, and indirect object (Jackendoff, 2002). Searching for direct musical equivalents of such functions is a misguided enterprise.
除了类别标识的这些差异之外,序列中语法类别的层次组织也显示了这两个领域的重要差异。语言中的句法树,如图 5.8 所示,传达选区关系:限定词加名词是名词短语,名词短语加动词短语是句子,等等。音乐中的句法树,如图 5.9 和 5.10 中的,不是组成树。在结构细化树(GTTM 中的时间跨度缩减树)的情况下,树的分支以指示哪些事件在结构上更重要的方式连接。在张力-松弛树(GTTM 中的延长-减少树)中,分支模式表明一个事件相对于另一个事件是代表紧张运动还是放松运动。7
Beyond these differences in category identity, the hierarchical organization of grammatical categories in sequences also shows important differences in the two domains. Syntactic trees in language, such as that in Figure 5.8, convey the relationship of constituency: A determiner plus a noun is a noun phrase, a noun phrase plus a verb phrase is a sentence, and so forth. Syntactic trees in music, such as that in Figures 5.9 and 5.10, are not constituent trees. In the case of structure-elaboration trees (time-span reduction trees in GTTM), the branches of the tree join in ways that indicate which events are more structurally important. In tension-relaxation trees (prolongation-reduction trees in GTTM), the branching pattern indicates whether an event represents a tensing or a relaxing movement in relation to another event.7
语言和音乐语法之间的另一个区别涉及长距离依赖性。这种关系,例如图 5.8中“女孩”和“开放”之间的关系,在语言中无处不在,可以假定每个正常的听众都能感知到它们(乔姆斯基,1965 年)。相比之下,音乐句法所提出的长距离关系,例如张力-松弛树中体现的关系(图 5.10),不能简单地假设为可感知的,而最好将其视为接受经验检验的假设,例如,使用第 5.2.2 节(关于张力和分辨率的小节)中描述的张力等级实验。换句话说,特定的音符或和弦序列不会将感知到的依赖关系限制在与特定单词序列相同的程度,这表明单词比音符或和弦具有更复杂的句法特征。(例如,动词的心理表征被认为包括其句法类别和主题角色信息,以及其语义意义;参见 Levelt,1999。)
Another difference between linguistic and musical syntax concerns long-distance dependencies. Such relations, such as between “girl” and “opened” in Figure 5.8, are ubiquitous in language and every normal listener can be assumed to perceive them (Chomsky, 1965). In contrast, the long-distance relations posited by musical syntax, such as the relations embodied in tension-relaxation trees (Figure 5.10), cannot simply be assumed to be perceived and are better viewed as hypotheses subject to empirical test, for example, using the tension-rating experiments described in section 5.2.2 (subsection on tension and resolution). Put another way, a particular sequence of notes or chords does not constrain perceived dependencies to the same degree as a particular sequence of words, suggesting that words have more intricate syntactic features built into them than do notes or chords. (For example, the mental representation of a verb is thought to include its syntactic category and thematic role information, in addition to its semantic meaning; cf. Levelt, 1999.)
可以提及的最后一个形式差异涉及句法歧义在这两个领域中的作用。在语言中,认知系统通常会避免句法歧义,认知系统会寻求对句子进行单一的结构分析。例如,在“The fireman left the building with the large sign”这句话中,大多数人会解析这句话,因此“the large sign”是“the building”而不是“the fireman”的修饰语,即使句子在结构上有歧义。相比之下,音乐中对句法歧义的容忍度要高得多。Krumhansl (1992) 指出,“一个和弦可以在不同的调中以其多种角色同时被听到,其效果是密切相关的调之间的转调很容易被吸收。和弦的作用永远不需要消除歧义”(第 199 页)。
One final formal difference that can be mentioned concerns the role of syntactic ambiguity in the two domains. In language, syntactic ambiguity is generally eschewed by the cognitive system, which seeks to arrive at a single structural analysis for a sentence. For example, in the sentence “The fireman left the building with the large sign,” most individuals will parse the sentence so that “the large sign” is a modifier of “the building” rather than of “the fireman,” even though the sentence is structurally ambiguous. In contrast, there is much greater tolerance for syntactic ambiguity in music. Krumhansl (1992) notes that “a chord may be heard simultaneously in its multiple roles in different keys, with the effect that modulations between closely related keys are easily assimilated. The role of a chord need never be disambiguated” (p. 199). Examples of such “pivot chords” abound in tonal music, and show that music not only tolerates syntactic ambiguity, it exploits it for aesthetic ends.
在回顾了音乐语法和语言语法之间的几个重要的形式差异之后,我们现在可以转向相似之处。如第 5.2.1 节所述,一个相似之处是存在多层次的组织。在语言中,有句法原则指导基本的词汇子单位(语素)如何组合成单词,单词如何组合成短语,以及短语如何组合成句子。在音乐中,有一些句法原则可以控制音调如何组合形成和弦,和弦如何组合形成和弦进行,以及由此产生的键或音调区域如何根据从一个到另一个的结构化运动进行调节。在这两个领域中,这种多层次的组织让大脑完成了一项非凡的壮举:根据传达有组织的意义模式的层次关系来感知元素的线性序列。在语言中,语法支持的一种含义是“谁对谁做了什么”,换句话说,句子中指称和谓词的概念结构。在音乐中,句法支持的一种含义是随着音乐及时展开而经历的紧张和决心的模式。
Having reviewed several important formal differences between musical and linguistic syntax, we can now turn to similarities. As reviewed in section 5.2.1, one similarity is the existence of multiple levels of organization. In language, there are syntactic principles that guide how basic lexical subunits (morphemes) are combined to form words, how words are combined to form phrases, and how phrases are combined to form sentences. In music, there are syntactic principles that govern how tones combine to form chords, how chords combine to form chord progressions, and how the resulting keys or tonal areas are regulated in terms of structured movement from one to another. In both domains, this multilayered organization allows the mind to accomplish a remarkable feat: A linear sequence of elements is perceived in terms of hierarchical relations that convey organized patterns of meaning. In language, one meaning supported by syntax is “who did what to whom,” in other words, the conceptual structure of reference and predication in sentences. In music, one meaning supported by syntax is the pattern of tension and resolution experienced as music unfolds in time.
如前所述,语言和音乐句法结构的层次组织遵循不同的原则。具体来说,语言句法树体现了构成结构,而音乐句法树则没有(参见第 5.3.1 节)。然而,在抽象层面上,语言和音乐句法树之间存在一些有趣的相似之处,尤其是语言树之间和 GTTM 中的延长减少树。首先,正如语言树的每个节点都终止于语言语法类别(例如,名词、动词、介词),延长-缩减树的每个节点都终止于音乐语法类别:分配给特定和声功能的和弦给定密钥(例如,I、IV)。也就是说,在这两种情况下,树结构都以分层方式关联语法类别,并且在这两种情况下,相同的语法类别可以由相同类别的不同成员填充。也就是说,一个人可以用不同的词有相同的句子结构,用不同的和弦可以有相同的和声结构(比如不同转位的和弦,或者如果改变调,则完全不同的一组和弦)。请注意,在进行此比较时,
As noted earlier, the hierarchical organization of linguistic and musical syntactic structures follows different principles. Specifically, linguistic syntactic trees embody constituent structure, whereas musical syntactic trees do not (cf. section 5.3.1). Nevertheless, some interesting parallels exist between linguistic and musical syntactic trees at an abstract level, particularly between linguistic trees and prolongation-reduction trees in GTTM. First, just as each node of a linguistic tree terminates on a linguistic grammatical category (e.g., noun, verb, preposition), each node of a prolongation-reduction tree terminates on musical grammatical category: a chord assigned to a particular harmonic function in a given key (e.g., I, IV). That is, in both cases the tree structures relate grammatical categories in a hierarchical fashion, and in both cases the same grammatical categories can be filled by different members of the same category. That is, one can have the same sentence structure with different words, and the same harmonic structure with different chords (such as chords in different inversions, or if the key is changed, an entirely different set of chords). Note that in making this comparison, there is no claim for a direct correspondence between categories in the two domains (e.g., between tonic chords and nouns).
关于语言和音乐中层次句法关系的最后一个相似之处值得一提,即递归句法结构的共享能力。在语言中,短语可以嵌入到同类短语中,例如图 5.8名词短语“the boy”嵌在较大的名词短语“the girl who kissed the boy”中。在音乐中,小规模的紧张和放松模式可以嵌入到具有相同几何形状但时间尺度更长的更大的紧张-放松模式中(例如,参见 Lerdahl & Jackendoff,1983:207)。递归句法结构已被提议作为区分人类语言和非人类交流系统的特征(Hauser 等人,2002 年;尽管参见 Gentner 等人,2006 年)。如果是这样的话,那么人类的音乐,就像人类的语言一样,与鸟鸣和鲸鱼鸣叫等动物声学表现有着根本的不同。
One final similarity regarding hierarchical syntactic relations in language and music bears mention, namely the shared capacity for recursive syntactic structure. In language, phrases can be embedded within phrases of the same type, for example, in Figure 5.8 the noun phrase “the boy” is embedded within the larger noun phrase “the girl who kissed the boy.” In music, small-scale patterns of tension and relaxation can be embedded in larger tension-relaxation patterns of identical geometry but of a longer timescale (see Lerdahl & Jackendoff, 1983:207, for an example). Recursive syntactic structure has been proposed as a feature that distinguishes human language from nonhuman communication systems (Hauser et al., 2002; though see Gentner et al., 2006). If this is the case, then human music, no less than human language, is fundamentally different from animal acoustic displays such as bird song and whale song.
尽管上一节讨论了层次结构,但本节关注的是两个域中句法“逻辑结构”的非层次方面。例如,这两个领域都认识到序列中“结构”和“精细”元素之间的区别。在语言中,阐述性元素采用形容词和副词等修饰语的形式。在调性音乐中,如前一节所述,精细元素是根据调性层次结构中的相对重要性以及节奏和动机信息来识别的。尽管在语言和音乐中区分结构和详述的方法大不相同,但在这两个领域中,这种概念上的区别在组织交际序列方面都发挥着作用。
Although the previous section dealt with hierarchical structure, this section is concerned with nonhierarchical aspects of “logical structure” of syntax in the two domains. For example, both domains recognize a distinction between “structural” and “elaborative” elements in sequences. In language, elaborative elements take the form of modifiers such as adjectives and adverbs. In tonal music, as discussed in a previous section, elaborative elements are identified on the basis of relative importance in the tonal hierarchy, together with rhythmic and motivic information. Although the means for distinguishing structure from elaboration are quite different in language and music, in both domains this conceptual distinction plays a role in organizing communicative sequences.
逻辑结构的另一个相似之处涉及语言和音乐中的语法功能。在语言的句法中,这样的功能包括主语、宾语和间接宾语。这些是单词在句子上下文中相对于其他单词所具有的逻辑功能,而不是孤立单词的固有属性。存在这种组织级别的证据是,有许多语法原则指的是这些功能,例如动词一致,这需要主语或宾语与其动词之间的一致(Jackendoff,2002:149)。调性音乐也有一个语法功能系统,在5.2.3 节中被称为“和声功能”多于。这些功能与和弦在特定调中所扮演的结构角色有关。如该部分所述,和弦的和声功能源自其上下文及其与其他和弦的关系,而不是和弦本身的内在属性。通常,识别出三种这样的功能:主音、次属音和属音,原型分别由键的 I、IV 和 V 和弦实例化。同一个和弦(例如,CEG)可以是一个调的主和弦,但另一个调是属或次属和弦,实证研究表明听众对这种功能差异非常敏感,如第 5.2.3 节所述. (请注意,主音功能不限于大和弦:在小调中,主音功能由小和弦演奏,例如 A 小调中的 ACE;Krumhansl 等人,1982。)相反,两个不同的和弦在同一个键——例如,IV 和弦和 II 6和弦——可以根据它们的使用方式具有相同的和声功能(参见第 5.2.3 节)。8个重点是和弦的和声功能是从它与其他和弦的关系中得出的心理属性。因此,音乐和语言一样,有一个依赖上下文的语法功能系统,这些功能是交际序列逻辑结构的一部分。当然,这两个领域的功能之间没有任何映射关系,例如,语言中的主题和音乐中的补品之间。
Another parallel in logical structure concerns grammatical functions in language and music. In the syntax of language, such functions include subject, object, and indirect object. These are logical functions that words take on with respect to other words in a sentence context, rather than being inherent properties of isolated words. The evidence that such a level of organization exists is that there are a number of grammatical principles that refer to these functions, such as verb agreement, which requires agreement between a subject or an object and its verb (Jackendoff, 2002:149). Tonal music also has a system of grammatical functions, discussed as “harmonic functions” in section 5.2.3 above. Such functions pertain to the structural role a chord plays in a particular key. As noted in that section, the harmonic function of a chord derives from the its context and its relation to other chords rather than to intrinsic properties of the chord itself. Typically, three such functions are recognized: tonic, subdominant, and dominant, prototypically instantiated by the I, IV, and V chords of a key, respectively. The same chord (e.g., C-E-G) can be a tonic chord in one key but a dominant or subdominant chord other keys, and empirical research shows that listeners are quite sensitive to this functional difference, as discussed in section 5.2.3. (Note that the tonic function is not limited to major chords: In minor keys, the tonic function is played by a minor chord, e.g., A-C-E in A minor; Krumhansl et al., 1982.) Conversely, two distinct chords in the same key—for example, a IV chord and a II6 chord—can have the same harmonic function by virtue of the way in which they are used (cf. section 5.2.3).8 The salient point is that a chord’s harmonic function is a psychological property derived from its relation to other chords. Thus music, like language, has a system of context-dependent grammatical functions that are part of the logical structure of communicative sequences. There is, of course, no claim for any mapping between functions in the two domains, for example, between subjects in language and tonics in music.
顺便说一句,读者可能会好奇,为什么音乐中的和声功能的数量通常被认为是三个,而在任何调中都有七个不同的和弦类别。该理论基于这样的想法,即根音被音乐五度或音乐二度(例如,I 和 V,或 I 和 ii)分隔的和弦在功能上是不同的,而根音被音乐三度(例如, I 和 vi,或 ii 和 IV)在功能上相似(Dalhaus,1990:58)该理论的哲学基础在于 19 世纪音乐理论家雨果黎曼的工作,他将不同的和弦功能与三个不同的逻辑方面联系起来思想:正题(主调)、反题(次属)和合题(主导;Dahlhaus,1990:51-52)。正如 Dahlhaus 所指出的,黎曼的基本论点是“听音乐的行为不是被动地忍受声音对听觉器官的影响,而是人类思维逻辑功能的高度发展应用。” 因此,黎曼在没有参考语言语法关系的情况下得出了三个不同函数的概念。
As an aside, readers may be curious why the number of harmonic functions in music has generally been recognized as three, when there are seven distinct chord categories in any key. This theory is based on the idea that chords whose roots are separated by a musical fifth or a musical second (e.g., I and V, or I and ii) are functionally different, whereas chords whose roots are separated by a musical third (e.g., I and vi, or ii and IV) are functionally similar (Dalhaus, 1990:58) The philosophical basis of this theory lies in the work of the 19th-century music theorist Hugo Riemann, who related the different chord functions to three different logical aspects of thought: thesis (tonic), antithesis (subdominant), and synthesis (dominant; Dahlhaus, 1990:51-52). As noted by Dahlhaus, Riemann’s basic thesis was “that the act of listening to music is not a passive sufferance of the effects of sound on the organ of hearing, but is much more a highly developed application of the logical functions of the human mind.” Thus Riemann arrived at the notion of three distinct functions without any reference to linguistic grammatical relations.
上述部分证实了语言句法和音乐句法之间存在重要差异,但也表明这些差异并不妨碍对这两个领域中句法形式相似性的认识和探索。成功比较的关键是避免寻找语言句法实体和关系的音乐类比的陷阱,例如名词、动词和语言句法树的组成结构。一旦避免了这个陷阱,人们就可以在更抽象的层面上识别出有趣的相似之处,我们可以称之为语言和音乐序列的“句法架构”。这些包括多层次组合组织的存在,序列中元素之间的分层(和递归)结构,可以由不同物理实体填充的语法类别、结构与详细说明的关系以及涉及元素之间相互依赖关系的上下文相关语法功能。这些相似之处很有趣,因为它们暗示了人类思维采用的句法组织的基本原则。
The above sections have confirmed that there are important differences between linguistic and musical syntax, but have also shown that these differences need not prevent the recognition and exploration of formal similarities between syntax in the two domains. The key to successful comparison is to avoid the pitfall of looking for musical analogies of linguistic syntactic entities and relations, such as nouns, verbs, and the constituent structure of linguistic syntactic trees. Once this pitfall is avoided, one can recognize interesting similarities at a more abstract level, in what one might call the “syntactic architecture” of linguistic and musical sequences. These include the existence of multiple levels of combinatorial organization, hierarchical (and recursive) structuring between elements in sequences, grammatical categories that can be filled by different physical entities, relationships of structure versus elaboration, and context-dependent grammatical functions involving interdependent relations between elements. These similarities are interesting because they suggest basic principles of syntactic organization employed by the human mind.
在过去的十年里,一种基于认知神经科学实证研究的音乐-语言句法关系的新方法诞生了。这项工作的主要动机是争论语言句法操作在多大程度上是“模块化的”,由专门用于语言功能的认知子系统进行,并且在很大程度上独立于其他形式的大脑处理(Fodor,1983 年;Elman 等人,1983 年)。 , 1996). 音乐为检验这一说法提供了一个理想的案例。正如我们所见,音乐和语言句法是丰富的系统,它们不是彼此的微不足道的变体。音乐和语言句法在神经上是独立的,还是有明显的重叠?任何重大的重叠都将成为模块化辩论中令人信服的证据。此外,
The past decade has seen the birth of a new approach to music-language syntactic relations, based on empirical research in cognitive neuroscience. A primary motivation for this work has been a debate over the extent to which linguistic syntactic operations are “modular,” conducted by a cognitive subsystem dedicated to linguistic function and largely independent of other forms of brain processing (Fodor, 1983; Elman et al., 1996). Music provides an ideal case for testing this claim. As we have seen, musical and linguistic syntax are rich systems that are not trivial variants of one another. Are musical and linguistic syntax neurally independent, or is there significant overlap? Any significant overlap would serve as compelling evidence in the debate over modularity. Furthermore, exploring this overlap could provide novel insights into fundamental syntactic operations in the human brain.
尽管重叠的前景令人振奋,但长期以来神经科学的证据似乎并不支持它。具体来说,神经心理学提供有据可查的音乐和语言句法能力分离的案例。例如,具有正常言语和语言能力的人可能会在脑损伤后或由于终身音乐性耳聋而表现出对音乐音调的感知受损(没有失语症的失乐症;Peretz,1993 年;Peretz 等人,1994 年;Ayotte 等人., 2000;Ayotte 等人,2002)。相反,有些人在脑损伤后有严重的语言障碍,但音乐句法能力幸免(无失语症;例如,Luria 等人,1965 年)。失乐症和失语症之间的这种双重分离导致人们强烈主张大脑中音乐和语言的独立性。例如,Marin 和 Perry (1999) 指出“这些完全分离的案例特别有趣,因为它们与语言和音乐共享共同神经基质的假设截然相反”(第 665 页)。同样,Peretz 和他的同事使用这种分离来论证音乐调性处理的高度模块化观点(Peretz 和 Coltheart,2003 年;Peretz,2006 年)。
Although the prospect of overlap is stimulating, evidence from neuro-science long seemed to disfavor it. Specifically, neuropsychology provided well-documented cases of dissociations between musical and linguistic syntactic abilities. For example, individuals with normal speech and language abilities may show impaired perception of musical tonality following brain damage or due to a lifelong condition of musical tone-deafness (amusia without aphasia; Peretz, 1993; Peretz et al., 1994; Ayotte et al., 2000; Ayotte et al., 2002). Conversely, there are persons with severe language impairments following brain damage but with spared musical syntactic abilities (aphasia without amusia; e.g., Luria et al., 1965). This double dissociation between amusia and aphasia has led to strong claims about the independence of music and language in the brain. For example, Marin and Perry (1999) state that “these cases of total dissociation are of particular interest because they decisively contradict the hypothesis that language and music share common neural substrates” (p. 665). Similarly, Peretz and colleagues have used such dissociations to argue for a highly modular view of musical tonality processing (Peretz & Coltheart, 2003; Peretz, 2006).
正如我们将要看到的,有理由怀疑这种观点是否正确。原因之一是来自健康个体处理音乐和语言句法关系的神经影像学证据。这一证据表明,神经重叠比人们根据脑损伤病例的分离所预期的要多得多。另一个原因涉及没有失乐症的失语症证据的性质,这将在下一节中讨论。这两个因素导致重新评估大脑中音乐和语言之间的句法关系。这一重新评估的一个结果是假设这两个领域具有不同的和领域特定的句法表征(例如,和弦与单词),但它们共享神经资源以在句法处理过程中激活和整合这些表征(Patel,2003b)。第 5.4.3 节。现在,只要说 SSIRH 可以解释神经心理学和神经影像学之间的明显矛盾就够了,它表明大脑中音乐和语言句法之间存在着深刻的联系。
As we shall see, there are reasons to doubt that this view is correct. One reason concerns neuroimaging evidence from healthy individuals processing syntactic relations in music and language. This evidence suggests far more neural overlap than one would expect based on dissociations in brain-damaged cases. Another reason concerns the nature of the evidence for aphasia without amusia, as discussed in the next section. These two factors have led to a reevaluation of syntactic relations between music and language in the brain. One outcome of this reevaluation is the hypothesis that the two domains have distinct and domain-specific syntactic representations (e.g., chords vs. words), but that they share neural resources for activating and integrating these representations during syntactic processing (Patel, 2003b). This “shared syntactic integration resource hypothesis” (SSIRH) is explained in more detail in section 5.4.3. For now, suffice it to say that the SSIRH can account for the apparent contradiction between neuropsychology and neuroimaging, and that it suggests a deep connection between musical and linguistic syntax in the brain.
第 5.4 节的其余部分分为四个部分。第一部分回顾了音乐句法和语言句法之间神经分离的一些经典证据。第二部分讨论了与这种传统观点相矛盾的神经影像学研究。第三部分讨论了使用认知理论来解决神经心理学和神经影像学之间明显的悖论,并介绍了 SSIRH。第四部分讨论了 SSIRH 的预测以及这些预测在实证研究中的进展情况。
The remainder of section 5.4 is divided into four parts. The first part reviews some of the classical evidence for neural dissociations between musical and linguistic syntax. The second part discusses neuroimaging research that contradicts this traditional picture. The third part discusses the use of cognitive theory to resolve the apparent paradox between neuropsychology and neuroimaging, and introduces the SSIRH. The fourth part discusses predictions of the SSIRH and how these predictions are faring in empirical research.
有充分的证据表明,即使没有语言困难,也可能存在音乐句法缺陷。患者 GL 提供了一个典型案例,由 Peretz 及其同事调查(Peretz,1993;Peretz 等,1994)。GL双侧颞叶损伤,双侧脑梗死脑中风。这是一种罕见的神经系统事件,但在获得性失忆症病例中并不少见。在 GL 的案例中,初级听觉皮层得以幸免,但颞上回头端受损,其中包括几个听觉关联区域(Peretz 等人,1994 年;参见 Tramo 等人,1990 年)。GL 是一个受过良好教育的人,他是一个狂热的音乐听众,尽管他没有接受过正式的音乐训练。脑损伤十年后,由于音乐感知持续存在问题,GL 被转诊进行神经心理学测试。Peretz 及其同事进行了大量测试以研究 GL 音乐缺陷的性质,从简单的音高辨别力到旋律辨别力以及对音调敏感性的测试。GL 可以区分单个音高之间的变化,并且对短旋律中旋律轮廓的差异很敏感。他还表现出对音程模式的一些残留敏感性(例如,在涉及区分具有相同轮廓但不同音程的旋律的测试中)。然而,他的案例最引人注目的是他对音调完全不敏感。例如,GL 被赋予了一个探测音调任务,其中几个音符(建立音乐键)后跟一个目标音调。任务是评估目标与先前上下文的匹配程度(参见 Cuddy & Badertscher,1987 年;和 在涉及区分具有相同轮廓但不同音程的旋律的测试中)。然而,他的案例最引人注目的是他对音调完全不敏感。例如,GL 被赋予了一个探测音调任务,其中几个音符(建立音乐键)后跟一个目标音调。任务是评估目标与先前上下文的匹配程度(参见 Cuddy & Badertscher,1987 年;和 在涉及区分具有相同轮廓但不同音程的旋律的测试中)。然而,他的案例最引人注目的是他对音调完全不敏感。例如,GL 被赋予了一个探测音调任务,其中几个音符(建立音乐键)后跟一个目标音调。任务是评估目标与先前上下文的匹配程度(参见 Cuddy & Badertscher,1987 年;和第 4 章,第 4.2.6 节,探测音研究的背景)。正常控制显示音调的标准效果:来自键的音调比非键音调的评级更高。相比之下,GL 则没有表现出这种效果,并且倾向于根据倒数第二个音和最后一个音之间的音高距离来做出判断。与对照组相比,他也未能在短期记忆任务中表现出有调旋律与无调旋律的优势。额外的实验表明,他的问题不能用一般的听觉记忆缺陷来解释。就目前的目的而言,最重要的是,GL 在标准化失语症测试中的得分处于正常范围,表明他没有语言句法缺陷。
There is good evidence that musical syntactic deficits can exist in the absence of linguistic difficulties. An exemplary case for this is provided by the patient G.L., investigated by Peretz and colleagues (Peretz, 1993; Peretz et al., 1994). G.L. had bilateral temporal lobe damage, with infarctions on both sides of the brain due to strokes. This is a rare neurological occurrence, but is not infrequent among cases of acquired amusia. In G.L.’s case, primary auditory cortex was spared, but there was damage to rostral superior temporal gyri, which encompasses several auditory association areas (Peretz et al., 1994; cf. Tramo et al., 1990). G.L. was a well-educated individual who had been an avid music listener, though he had no formal musical training. Ten years after his brain damage, G.L. was referred for neuropsychological testing because of persistent problems with music perception. Peretz and colleagues administered a large battery of tests to study the nature of G.L.’s musical deficits, ranging from simple pitch discrimination to melody discrimination and tests for sensitivity to tonality. G.L. could discriminate changes between single pitches and was sensitive to differences in melodic contour in short melodies. He also showed some residual sensitivity to patterns of pitch intervals (e.g., in tests involving discrimination of melodies with the same contour but different intervals). What was most striking about his case, however, was his complete absence of sensitivity to tonality. For example, G.L. was given a probe-tone task in which a few notes (which establish a musical key) were followed by a target tone. The task was to rate the how well the target fit with the preceding context (cf. Cuddy & Badertscher, 1987; and Chapter 4, section 4.2.6, for background on probe-tone studies). Normal controls showed the standard effect of tonality: Tones from the key were rated higher than out-of-key tones. G.L., in contrast, showed no such effect, and tended to base his judgments on the pitch distance between the penultimate and final tone. He also failed to show an advantage for tonal versus atonal melodies in short-term memory tasks, in contrast to controls. Additional experiments showed that his problems could not be accounted for by a general auditory memory deficit. Most importantly for the current purposes, G.L. scored in the normal range on standardized aphasia tests, showing that he had no linguistic syntactic deficit.
GL 是少数有据可查的后天性失乐症案例之一。音乐性耳聋或先天性失乐症的病例更为常见(参见第 4 章,第 4.5.2 节),第二小节,关于音调性耳聋的背景)。例如,Ayotte 等人。(2002) 提出了 11 个这样的人的小组研究。这些人受过良好教育,没有其他神经或精神问题,他们的音聋不能归因于缺乏音乐接触(事实上,所有人都在童年时上过音乐课)。测试时,所有人都表现出各种音乐缺陷。至关重要的是,所有人都未能通过旋律酸音检测任务,这是对音乐句法能力的简单测试。受过西方音乐熏陶的普通人会觉得这是一件容易的事,即使他们没有接受过正规的音乐训练,
G.L. is one of a handful of well-documented cases of acquired amusia. Cases of musical tone-deafness or congenital amusia are much more common (cf. Chapter 4, section 4.5.2, second subsection, for background on musical tone deafness). For example, Ayotte et al. (2002) presented a group study of 11 such individuals. These were well-educated persons with no other neurological or psychiatric problems, whose tone deafness could not be attributed to lack of exposure to music (indeed, all had music lessons in childhood). When tested, all showed a variety of musical deficits. Crucially, all failed a melodic sour-note detection task, which is a simple test of musical syntactic abilities. Normal individuals enculturated in Western music find this an easy task, even if they have had no formal musical training,
因此,后天和先天性失乐症提供了令人信服的证据,证明音乐句法可以在没有相关语言句法缺陷的情况下被打乱。反过来的情况呢?相关证据来自无失语症的失语症病例。最常被引用的案例是俄罗斯人作曲家 Vissarion Shebalin (1902-1963)。Shebalin 引起了著名神经心理学家 AR Luria (Luria et al., 1965) 的兴趣,他评论说“语言和音乐这两种听觉过程的关系构成了皮层神经学最有趣的问题之一”(第 288 页)。谢巴林的左半球两次中风,影响了颞叶和顶叶区域。第二次中风后,他在理解和说出语言方面出现严重困难。谢巴林在第二次中风 4 年后去世,但在那几年里他至少创作了九首新作品,其中包括一部被苏联作曲家肖斯塔科维奇誉为“杰出的创造性作品”的交响乐(第 292 页)。
Acquired and congenital amusia thus provide compelling evidence that musical syntax can be disrupted without associated linguistic syntactic deficits. What about the reverse condition? The relevant evidence comes from cases of aphasia without amusia. The most often-cited case is that of the Russian composer Vissarion Shebalin (1902-1963). Shebalin attracted the interest of the famous neuropsychologist A. R. Luria (Luria et al., 1965), who commented that “the relationship of the two kinds of acoustic processes, namely verbal and musical, constitutes one of the most interesting problems of cortical neurology” (p. 288). Shebalin suffered two strokes in his left hemisphere, affecting the temporal and parietal regions. After the second stroke, he had severe difficulties in comprehending and producing language. Shebalin died 4 years after this second stroke, but in those few years he composed at least nine new pieces, including a symphony hailed by the Soviet composer Shostakovich as a “brilliant creative work” (p. 292).
Shebalin 的案例并非个例。Tzortzis 等人。(2000 年,表 4)列出了 6 个已发表的失语症但没有失眠症的病例,并在他们自己的论文中报告了第 7 个病例。然而,Tzortzis 等人。还指出了关于这些案例的一个关键事实:他们都是专业音乐家。事实上,大多数案例代表的是作曲家或指挥家,他们是受过非常高水平的音乐训练和成就的人。(请注意,这与没有失语症的失乐症案例研究形成鲜明对比,后者通常涉及非音乐家。)那么,问题是基于训练有素的音乐家的发现是否可以推广到普通人。有理由怀疑这个问题的答案是否定的。” 神经可塑性研究表明,专业音乐家的大脑与非音乐家的大脑在很多方面有所不同,包括额叶皮层特定区域灰质密度增加和胼胝体大小增加(Schlaug 等人,1995 年;Gaser &施劳格,2003 年)。这表明不能根据专业音乐家的案例研究得出关于失语症中语言-音乐关系的概括。为了最终显示失乐症和失语症之间的双重分离,我们需要非音乐家没有失乐症的失语症的证据。Peretz 及其同事 (2004) 认为存在这样的案例,但仔细检查他们引用的案例后发现,这些人患有一种称为“纯词性耳聋”的现象。虽然单纯的言语失聪有时被称为失语症的一种形式,它实际上是一种听觉失认症。纯文字性耳聋的人无法再理解口头材料,但可以理解和/或产生其他形式的语言(即写作)。这与真正的失语症有质的区别,后者是跨模态的核心语言功能缺陷 (Caplan, 1992)。
Shebalin’s case is not unique. Tzortzis et al. (2000, Table 4) list six published cases of aphasia without amusia, and report a seventh case in their own paper. However, Tzortzis et al. also point out a crucial fact about these cases: They are all professional musicians. Indeed, most cases represent composers or conductors, individuals with an extraordinarily high degree of musical training and achievement. (Note that this stands in sharp contrast to case studies of amusia without aphasia, which typically involve nonmusicians.) The question, then, is whether findings based on highly trained musicians can be generalized to ordinary individuals. There are reasons to suspect that the answer to this question is “no.” Research on neural plasticity has revealed that the brains of professional musicians differ from those of nonmusicians in a variety of ways, including increased gray matter density in specific regions of frontal cortex and increased corpus callosum size (Schlaug et al., 1995; Gaser & Schlaug, 2003). This suggests that generalizations about language-music relations in aphasia cannot be drawn on the basis of case studies of professional musicians. To conclusively show a double dissociation between amusia and aphasia, one needs evidence of aphasia without amusia in nonmusicians. Peretz and colleagues (2004) have argued that such cases exist, but closer examination of their cited cases reveals that these individuals suffered from a phenomenon known as “pure word deafness.” Although pure word deafness is sometimes referred to as a form of aphasia, it is in fact an auditory agnosia. An individual with pure word deafness can no longer understand spoken material but can understand and/or produce language in other modalities (i.e., writing). This is qualitatively different from true aphasia, which is a deficit of core language functions that cuts across modalities (Caplan, 1992).
当前讨论的相关要点是,没有令人信服的证据证明普通脑损伤患者的音乐和语言句法能力之间存在双重分离。事实上,如后文所述,失语症(非音乐家)的新证据表明语言和音乐句法障碍之间存在关联。此外,正常人表现出音乐句法缺陷的乐音耳聋现象在很大程度上与音乐-语言句法关系问题无关,正如我们将要看到的那样。在讨论这些之前然而,重要的是回顾一些神经影像学证据,这些证据挑战了人脑中语言和音乐句法的单独处理的想法。
The relevant point for the current discussion is that there has not been a convincing demonstration of a double dissociation between musical and linguistic syntactic abilities in ordinary individuals with brain damage. Indeed, as discussed later, new evidence from aphasia (in nonmusicians) points to an association between linguistic and musical syntactic disorders. Furthermore, the phenomenon of musical tone deafness, in which otherwise normal individuals exhibit musical syntactic deficits, is largely irrelevant to the question of music-language syntactic relations, as we shall see. Before discussing these issues, however, it is important to review some of the neuroimaging evidence that challenged the idea of separate processing for linguistic and musical syntax in the human brain.
Patel、Gibson 等人的研究是比较同一组个体大脑对语言和音乐句法处理的反应的首批研究之一。(1998)。这项研究的灵感来自两个早期的工作。首先,关于大脑对音乐中和声不协调的反应的研究,例如旋律序列末尾的走调音符(Besson & Faïta,1995),表明这些不协调引发了积极的事件相关大脑潜力(ERP)。这种积极的 ERP 与通常观察到的与语言语义异常相关的消极 ERP 形成鲜明对比(N400,在语义不一致的词出现后约 400 毫秒达到峰值,例如“我接受”中的“狗”一词我的咖啡加奶油和狗”;Kutas & Hillyard,1984)。第二类研究涉及大脑对语言中句法(而不是语义)不一致的反应(Osterhout & Holcomb, 1992, 1993; Hagoort et al., 1993)。这项研究表明,当一个词打乱句子的句法形式时,就会产生一个正向的 ERP。这个 ERP 被称为 P600,因为它在不协调的词开始后约 600 毫秒达到峰值(例如,在“经纪人希望出售股票被送进监狱”中的“was”开始后)。9
One of the first studies to compare brain responses to linguistic and musical syntactic processing in the same set of individuals was that of Patel, Gibson, et al. (1998). This study was inspired by two earlier lines of work. First, research on brain responses to harmonic incongruities in music, such as an out-of-key note at the end of a melodic sequence (Besson & Faïta, 1995), had revealed that these incongruities elicited a positive-going event-related brain potential (ERP). This positive ERP stood in sharp contrast to the commonly observed negative-going ERP associated with semantic anomalies in language (the N400, which peaks about 400 ms after the onset of a semantically incongruous word, such as the word “dog” in “I take my coffee with cream and dog”; Kutas & Hillyard, 1984). The second line of research concerned brain responses to syntactic (rather than semantic) incongruities in language (Osterhout & Holcomb, 1992, 1993; Hagoort et al., 1993). This research has revealed that when a word disrupted the syntactic form of a sentence, a positive-going ERP was generated. This ERP was referred to as the P600, because it peaked about 600 ms after the onset of the incongruous word (e.g., after the onset of “was” in “The broker hoped to sell the stock was sent to jail”).9
Patel、Gibson 等人提出的问题。(1998) 是和弦序列中和声不一致的和弦是否会产生类似于句子中句法不一致的单词所引发的 P600。如果是这样,那么这表明句法处理的某些方面在两个域之间共享。
The question asked by Patel, Gibson, et al. (1998) was whether harmonically incongruous chords within chord sequences would generate P600s akin to those elicited by syntactically incongruous words in sentences. If so, then this would suggest that some aspect of syntactic processing was shared between the two domains.
在更详细地讨论这项研究之前,有必要对 N400 和 P600 提出一个重要的观点。这些 ERP 最常在不协调(语义或句法)的背景下进行研究,但因此认为它们只是大脑因惊讶、注意力转换等原因发出的“错误信号”是错误的。至关重要的是,这两种 ERP 都可以在没有任何语义或句法错误的句子中引出。例如,如果对“女孩在课后把糖果放在嘴里”和“女孩在课后把糖果放在口袋里”这两个句子的每个词都测量了 ERP,则比较“口袋”的 ERP与“mouth”相比,前一个词的 N400(Hagoort 等人,1999)。这反映了这样一个事实,即考虑到当时的上下文,“pocket”在语义上比“mouth”更难预测。因此,N400 是语言中语义整合的敏感度量。
Before discussing this study in more detail, it is worth making an important point about the N400 and P600. These ERPs have been most often studied in the context of incongruities (semantic or syntactic), but it is a mistake to think they are therefore simply “error signals” emitted by the brain due to surprise, attention switching, and so forth. Crucially, both of these ERPs can be elicited in sentences without any semantic or syntactic errors. For example, if ERPs are measured to each word of the sentences “The girl put the sweet in her mouth after the lesson” and “The girl put the sweet in her pocket after the lesson,” a comparison of the ERP at “pocket” versus “mouth” reveals an N400 to the former word (Hagoort, et al., 1999). This reflects the fact that “pocket” is a less semantically predictable word than “mouth” given the context up to that point. The N400 is thus a sensitive measure of semantic integration in language.
同样,可以在没有任何句法错误的情况下引出 P600(参见 Kaan 等人,2000 年;Gouvea 等人,已提交)。例如,如果在“The broker hopeed to sell the stock”和“The broker visualized to sell the stock was sent to jail”这两个句子中比较“to”这个词,则在后一句中观察到 P600 为“to” , 即使它是一个完全合乎语法的句子 (Osterhout & Holcomb, 1992)。这是因为在前一句中,动词“hoped”明确要求句补,所以“to”在结构上是允许的。然而,在后一句话中,当第一次遇到“persuaded”这个词时,一个简单的主动动词解释是可能的,而且往往更受欢迎(例如,“经纪人说服他的客户出售他的股票”)。这种解释不允许附加以“to”开头的成分。” 因此,在“to”处存在一些句法整合困难,尽管很快就会发现动词“persuaded”实际上是一个简化的关系从句(即“被说服的经纪人......”),和句子理解进展迅速。因此,尽管经常使用坦率的异常来引出 N400 和 P600,但重要的是要注意这只是权宜之计,而不是必要的。
Similarly, the P600 can be elicited without any syntactic error (cf. Kaan et al., 2000; Gouvea et al., submitted). For example, if the word “to” is compared in the sentences “The broker hoped to sell the stock” and “The broker persuaded to sell the stock was sent to jail,” a P600 is observed to “to” in the latter sentence, even though it is a perfectly grammatical sentence (Osterhout & Holcomb, 1992). This is because in the former sentence, the verb “hoped” unambiguously requires a sentential complement, so that “to” is structurally allowed. In the latter sentence, however, when the word “persuaded” is first encountered, a simple active-verb interpretation is possible and tends to be preferred (e.g., “The broker persuaded his client to sell his shares”). This interpretation does not permit the attachment of a constituent beginning with “to.” As a consequence, there is some syntactic integration difficulty at “to,” although it soon becomes obvious that the verb “persuaded” is actually functioning as a reduced relative clause (i.e., “The broker who was persuaded . . .”), and sentence understanding proceeds apace. Thus although frank anomalies are often used to elicit the N400 and P600, it is important to note that this is simply a matter of expediency rather than of necessity.
回到我们的研究(Patel、Gibson 等人,1998),我们构建了目标短语容易、困难或不可能与前面的句法上下文整合的句子,例如:
Returning to our study (Patel, Gibson, et al., 1998), we constructed sentences in which a target phrase was either easy, difficult, or impossible to integrate with the preceding syntactic context, such as the following:
(5.1a) 一些参议员提倡一种旧的正义观念。
(5.1b) 一些赞成的参议员提倡一种旧的正义观念。
(5.1c) 一些参议员赞同提倡一种旧的正义观念。
(5.1a) Some of the senators had promoted an old idea of justice.(5.1b) Some of the senators endorsed promoted an old idea of justice.(5.1c) Some of the senators endorsed the promoted an old idea of justice.
句子 5.1b 的句法结构比 5.1a 复杂得多,如图 5.11所示(请注意,在句子 5.1b 中,动词“endorsed”与讨论中的动词“persuaded”具有相同类型的歧义多于。)
The syntactic structure of sentence 5.1b is considerably more complex than that of 5.1a, as shown in Figure 5.11 (Note that in sentence 5.1b, the verb “endorsed” has the same type of ambiguity as the verb “persuaded” in the discussion above.)
因此,在句子 5.1b 中,目标短语应该比在 5.1a 中更难与前面的结构整合。对比句子 5.1a 和 5.1b,这是符合语法的句子,5.1c 是不符合语法的,使得目标无法与前面的上下文整合。
Thus in sentence 5.1b, the target phrase should be more difficult to integrate with the preceding structure than in 5.1a. In contrast to sentences 5.1a and 5.1b, which are grammatical sentences, 5.1c is ungrammatical, making the target impossible to integrate with the previous context.
我们还构建了 7-12 个和弦的序列,其中短语中间部分的目标和弦被设计成在与先前上下文集成的难易程度方面有所不同。我们的音乐设计原则基于之前的研究,该研究表明听众对音乐中的关键距离很敏感。因此,目标和弦要么是序列调的主和弦,要么是附近或远处调的主和弦。“近”和“远”是使用主键的五度圈定义的:近键在五度圈上逆时针三步远,远键逆时针五步远(图 5.12)。
We also constructed sequences of 7-12 chords in which a target chord within the middle part of the phrase was designed to vary in its ease of integration with the prior context. We based our musical design principles on previous research showing that listeners were sensitive to key distance in music. Thus the target chord was either the tonic chord of the key of the sequence, or the tonic chord of a nearby or distant key. “Nearby” and “distant” were defined using the circle of fifths for major keys: A nearby key was three counterclockwise steps away on the circle of fifths, and a distant key was five counterclockwise steps away (Figure 5.12).
图 5.11 Patel、Gibson 等人研究中简单句和复杂句的句法结构。(1998)。大多数符号在图 5.8的标题中进行了解释;N' = 名词短语投影,O = 运算符,t = 跟踪。(B) 中的句子比 (A) 中的句子复杂得多,因为动词“endorsed”用作简化关系从句(即,“一些参议员被认可......”)。
Figure 5.11 Syntactic structures for the simple and complex sentences in the study of Patel, Gibson et al. (1998). Most symbols are explained in the caption of Figure 5.8; N’ = noun phrase projection, O = operator, t = trace. The sentence in (B) is substantially more complex than the sentence in (A) because the verb “endorsed” functions as a reduced relative clause (i.e., “some of the senators who were endorsed . . .”).
例如,如果和弦序列为 C 大调,则目标和弦为:C 大调 (CEG)、降 E 大调 (F ♭ -GB ♭ ) 或降 D 大调 (D ♭ -FA ♭ ) ). 这种设计的优点在于,两个外调和弦相对于 C 大调的外调音符数量相同,因此和声不协调的差异不能归因于外调数量的差异。和弦中的关键音符。
For example, if the chord sequence was in the key of C major, then the target chords were: C major (C-E-G), E-flat major (F♭-G-B♭), or D-flat major (D♭-F-A♭). This design had the advantage that the two out-of-key chords had the same number of out-of-key notes relative to C major, so that differences in harmonic incongruity could not be attributed to differences in the number of out-of-key notes within the chords.
和弦序列的另外两个方面值得一提。首先,目标和弦总是出现在属音(V 或 V7)之后,因此总是处于特定的和声环境中。其次,和弦序列是以流行的而非古典的风格创作的:它们听起来像音乐叮当声而不是传统的四声部和声(声音示例 5.3a-c)。因此,对于语言和音乐,我们使用句法原则来构建具有不同程度的句法不一致的序列。一组 15 名受过音乐训练的听众听到了如上所示的许多语言和音乐序列,并且对于每个序列,判断它听起来是正常的还是结构上奇怪的(我们选择与受过音乐训练的听众一起工作,因为我们想确保对音乐句法的敏感性). 语言中目标短语的 ERP 与音乐中目标和弦的 ERP 进行了比较。感兴趣的主要结果是两个领域的不协调引发了 P600,并且这些 ERP 在中度和强烈不协调水平下的振幅和头皮分布在统计上无法区分(图 5.13显示了在强烈不协调水平下针对语言和音乐目标的 ERP。)这表明 P600 不是特定语言句法过程的标志。帕特尔等人。建议 P600 可能反映两个域中的域-一般结构整合过程(我在第 5.4.3节“调和悖论”小节中回到这一点)。
Two other aspects of the chord sequences bear mention. First, the target chord always occurred after a dominant (V or V7), and was thus always in a specific harmonic context. Second, the chord sequences were composed in a popular, rather than a classical style: They sound like musical jingles rather than like traditional four-part harmony (Sound Examples 5.3a–c). Thus for both language and music, we used syntactic principles to construct sequences with varying degrees of syntactic incongruity. A single group of 15 musically trained listeners heard numerous linguistic and musical sequences like those shown above, and for each sequence, judged whether it sounded normal or structurally odd (we chose to work with musically trained listeners because we wanted to ensure sensitivity to musical syntax). ERPs to the target phrases in language were compared to ERPs to the target chords in music. The primary result of interest was that incongruities in both domains elicited P600s, and that these ERPs were statistically indistinguishable in amplitude and scalp distribution at both the moderate and strong levels of incongruity (Figure 5.13 shows ERPs to linguistic and musical targets at the strong level of incongruity.) This demonstrated that the P600 was not a signature of a language-specific syntactic process. Patel et al. suggested that the P600 may reflect domain-general structural integration processes in both domains (I return to this point in section 5.4.3, subsection “Reconciling the Paradox”).
图 5.12来自 Patel、Gibson 等人 (1998) 研究的示例和弦序列。目标弦的位置由箭头指示。(A) 目标和弦是乐句调的主音(在这种情况下是 C 大调和弦,因为乐句在 C 调中)。(B) 目标和弦是附近调(降 E 大调)的主音。(C) 目标和弦是远调(降 D 大调)的主音。近键和远键被定义为距键的五分音圈上的短语键逆时针方向三步和五步(参见图 5.6)。请注意,每个刺激都具有相同的和弦进行,但在目标前后的和弦倒转方面略有不同(尽管目标之前的两个和弦和目标之后的一个和弦保持不变)。
Figure 5.12 Example chord sequences from Patel, Gibson et al.’s (1998) study. The position of the target chord is indicated by the arrow. (A) The target chord is the tonic of the key of the phrase (a C major chord in this case, as the phrase is in the key of C). (B) The target chord is the tonic of a nearby key (E-flat major). (C) The target chord is the tonic of a distant key (D-flat major). Nearby and distant keys are defined as three versus five counterclockwise steps away from key of the phrase on the circle of fifths for keys (cf. Figure 5.6). Note that each stimulus has the same chord progression but differs slightly in terms of the inversion of chords before and after the target (though the two chords before and one chord after the target are held constant).
在这项研究中,关于 P600 的潜在神经源可以说些什么?语言和音乐 P600 都在大脑的颞叶/后部区域达到最大,但很难在此基础上就其潜在来源的精确位置做出任何结论。这是因为 ERP 技术具有出色的时间分辨率,但空间分辨率较差。然而,可以自信地说的一件事是,P600 的发生器不太可能被强烈偏侧化,因为 ERP 在大脑的左右两侧是对称的。(我们还观察到由跑调和弦引起的意想不到的和高度偏侧化的右前颞叶负性,其在目标开始后约 350 毫秒处达到峰值。参见 Patel、Gibson 等人,1998 年的讨论,以及 Koelsch 和穆德, 2002,
What can be said about the underlying neural source of the P600s in this study? Both the linguistic and musical P600s were maximal over temporal/posterior regions of the brain, but it is difficult to make any conclusions about the precise location of their underlying sources on this basis. This is because the ERP technique has excellent time resolution but poor spatial resolution. One thing that can be said with confidence, however, is that the generators of the P600 are unlikely to be strongly lateralized, as the ERP was symmetric across the left and right sides of the brain. (We also observed an unexpected and highly lateralized right anterior-temporal negativity elicited by out-of-key chords, which peaked at about 350 ms posttarget-onset. See Patel, Gibson, et al., 1998, for discussion, and Koelsch and Mulder, 2002, for a similar finding using more naturalistic musical materials and nonmusician listeners.)
图 5.13轨迹显示了来自头部中间三个电极的语言(实线)和音乐(虚线)句法不一致的 ERP(Fz = 前面;Cz = 顶点;Pz = 后面)。(图左侧的示意图显示了电极位置,就像从上方俯视头部一样。)ERP 响应在 600 毫秒附近非常相似。超过 600 毫秒的语言 P600 的持续积极性是由于超过这一点的句子持续不合语法。有关详细信息,请参见 Patel、Gibson 等人,1998 年。
Figure 5.13 Traces show ERPs to linguistic (solid line) and musical (dashed line) syntactic incongruities from three electrodes along the middle of the head (Fz = front; Cz = vertex; Pz = back). (The schematic on the left side of the figure shows the electrode positions as if looking down on the head from above.) The ERP responses are highly similar in the vicinity of 600 ms. The continued positivity of the linguistic P600 beyond 600 ms is due to the continuing ungrammaticality of the sentence beyond this point. See Patel, Gibson et al., 1998, for details.
随后关于音乐句法处理的工作通过表明音乐句法处理激活大脑的“语言”区域来支持语言和音乐之间句法重叠的情况。梅斯等人。(2001)MEG 提供的证据表明,与音乐中的和声处理相关的早期右前负性 (ERAN) 起源于称为布罗卡区的左额脑区域及其右半球同系物(参见 Koelsch & Siebel,2005)。谐波处理的 fMRI 研究(Tillmann 等人,2003 年;cf. Tillmann,2005 年)也报告了这些区域的激活,第二项此类研究(Koelsch 等人,2002 年)在音乐谐波处理中涉及 Broca 区和 Wernicke 区(参见 Levitin 和 Menon,2003 年;Brown 等人,2006 年)。
Subsequent work on musical syntactic processing has supported the case for syntactic overlap between language and music by showing that musical syntactic processing activates “language” areas of the brain. Maess et al. (2001) provided evidence from MEG that an early right anterior negativity (ERAN) associated with harmonic processing in music originates in a left frontal brain area known as Broca’s area and its right hemisphere homologue (cf. Koelsch & Siebel, 2005). An fMRI study of harmonic processing (Tillmann et al., 2003; cf. Tillmann, 2005) also reported activation of these areas, and a second such study (Koelsch et al., 2002) implicated both Broca’s and Wernicke’s areas in musical harmonic processing (cf. Levitin & Menon, 2003; Brown et al., 2006).
这些来自神经影像学的发现指出大脑中语言和音乐语法之间的重叠,与神经心理学提供的音乐-语言分离的证据形成鲜明对比(参见第 5.4.1 节)。任何试图理解音乐和语言语法的神经关系的尝试都必须接受这个悖论。Patel (2003b) 提出了一种基于心理语言学和音乐认知中的认知理论的此类尝试。下一节将提供此方法的更新版本。
These findings from neuroimaging, which point to overlap between linguistic and musical syntax in the brain, stand in sharp contrast to the evidence for music-language dissociations provided by neuropsychology (cf. section 5.4.1). Any attempt to understand the neural relationship of musical and linguistic syntax must come to terms with this paradox. One such attempt, based on cognitive theory in psycholinguistics and music cognition, was offered by Patel (2003b). An updated version of this approach is offered in the next section.
语言和音乐的认知理论表明,语言和音乐句法的心理表征是完全不同的。例如,如5.3.1 节所述,语言句法和音乐句法之间的一个非常重要的区别是语言具有通用的语法类别(例如名词和动词)和语法功能(主语、直接宾语和间接宾语),它们是语言所独有。此外,远距离句法依赖的感知受语言结构的限制比受音乐结构的限制要大得多。在音乐中,建议的层次模式(例如图 5.10中的模式)最好被视为接受实证检验的假设。
Cognitive theories of language and music suggest that the mental representations of linguistic and musical syntax are quite different. For example, as discussed in section 5.3.1, one very important difference between syntax in language and music is that language has universal grammatical categories (such as nouns and verbs) and grammatical functions (subject, direct object, and indirect object) that are unique to language. Furthermore, the perception of long-distance syntactic dependencies is much more constrained by linguistic structure than by musical structure. In music, proposed hierarchical patterns (such as that in Figure 5.10) are best viewed as hypotheses subject to empirical test.
这些观察结果表明,语言和音乐句法的重叠不在表征层面。因此,打破上述悖论的一种方法是提出句法表示和句法处理之间的概念区分。这可以理解为领域中的长期结构知识(即存储单词和和弦知识的关联网络中)与为建立连贯感知而对该知识进行的操作之间的区别。这种方法的一个关键思想是,句法理解中涉及的一些过程依赖于与句法表征所在的区域分开的大脑区域。一些关注句法神经语言学的研究人员已经提出了这种“双系统”方法。例如,Caplan 和 Waters (1999) 提出,大脑的额叶区域支持用于语言句法操作的特殊工作记忆系统,而 Ullman (2001) 提出,额叶区域包含用于语言句法的符号处理系统。这里采用的方法是双系统方法,但并没有提出语言和音乐句法共享一个特殊的记忆系统或符号操纵系统。相反,关于语言和音乐句法处理共享什么的假设来自对这两个领域句法处理认知理论的比较。
These observations suggest that the overlap in linguistic and musical syntax is not at the level of representation. Thus one way to break the paradox outlined above is to propose a conceptual distinction between syntactic representation and syntactic processing. This can be understood as the distinction between long-term structural knowledge in a domain (i.e., in associative networks that store knowledge of words and chords) and operations conducted on that knowledge for the purpose of building coherent percepts. A key idea of this approach is that some of the processes involved in syntactic comprehension rely on brain areas separate from the those areas in which syntactic representations reside. Such “dual system” approaches have been proposed by several researchers concerned with the neurolinguistics of syntax. For example, Caplan and Waters (1999) have suggested that frontal areas of the brain support a special working memory system for linguistic syntactic operations, and Ullman (2001) has suggested that frontal areas contain a symbol-manipulation system for linguistic syntax. The approach taken here is a dual system approach, but does not propose that linguistic and musical syntax share a special memory system or symbol manipulation system. Instead, a hypothesis for what is shared by linguistic and musical syntactic processing is derived from comparison of cognitive theories of syntactic processing in the two domains.
在介绍这些理论之前,应该先说明两点。首先,存在拒绝表示和处理之间分离的语言句法理论方法,以及句法表示和处理发生在同一网络中的人工神经网络(“连接主义”)模型(MacDonald & Christiansen,2002),因此通过暗示在相同的大脑区域。其次,下面考虑的理论绝不是语言和音乐中句法处理的唯一理论。选择它们是因为它们具有强大的经验基础,并且因为它们显示出显着的趋同点。
Before introducing these theories, two related points should be made. First, there are theoretical approaches to linguistic syntax that reject a separation between representation and processing, and artificial neural network (“connectionist”) models in which syntactic representation and processing occur in the same network (MacDonald & Christiansen, 2002), and thus by implication in the same brain areas. Second, the theories considered below are by no means the only theories of syntactic processing in language and music. They were chosen because of their strong empirical basis and because they show a remarkable point of convergence.
Gibson 的依存局部性理论 (DLT; Gibson, 1998, 2000) 的发展是为了解释语法句子的感知复杂性的差异以及对句法歧义句子的解释偏好。DLT 假设语言句子理解涉及两个不同的组成部分,每个组成部分都消耗神经资源。一个组成部分是结构存储,它涉及在及时感知句子时跟踪预测的句法类别(例如,当遇到名词时,预测动词以形成完整的从句)。另一个组成部分是结构整合,换句话说,将每个传入的单词连接到它在句子结构中所依赖的先前单词。该理论的一个基本前提是整合成本受局部影响:成本随着新元素和集成站点之间的距离而增加。距离被衡量为自整合站点以来新的“话语指称”(名词和动词)的数量。因此,分布式账本技术使用线性距离度量而不是分层度量(例如,基于句法树中的节点计数),因此不依赖于任何特定短语结构理论的细节。
Gibson’s dependency locality theory (DLT; Gibson, 1998, 2000) was developed to account for differences in the perceived complexity of grammatical sentences and for preferences in the interpretation of syntactically ambiguous sentences. DLT posits that linguistic sentence comprehension involves two distinct components, each of which consumes neural resources. One component is structural storage, which involves keeping track of predicted syntactic categories as a sentence is perceived in time (e.g., when a noun is encountered, a verb is predicted in order to form a complete clause). The other component is structural integration, in other words, connecting each incoming word to a prior word on which it depends in the sentence structure. A basic premise of this theory is that the cost of integration is influenced by locality: Cost increases with the distance between the new element and the site of integration. Distance is measured as the number of new “discourse referents” (nouns and verbs) since the site of integration. Thus DLT uses a linear measure of distance rather than a hierarchical one (e.g., based on counting nodes in a syntactic tree), and thus does not depend on the details of any particular phrase structure theory.
为了说明 DLT 的方法,请考虑句子中报告者和发送这两个词之间的关系:
To illustrate DLT’s approach, consider the relationship between the words reporter and sent in the sentences:
(5.2a) 派摄影师来找编辑的记者希望有个故事。
(5.2b) 摄影师派给编辑的记者希望有一个故事。
(5.2a) The reporter who sent the photographer to the editor hoped for a story.(5.2b) The reporter who the photographer sent to the editor hoped for a story.
在句子 5.2a 中,当到达“sent”时,与其相关的“reporter”整合起来相对容易,因为这两个词在句子中几乎相邻。然而,在句子 5.2b 中,“sent”和“reporter”(现在是动词的宾语)之间的整合更加困难,因为它必须跨越中间词名词短语,“摄影师”。10该理论的优势在于它能够对句子中每个单词的处理(存储和整合)成本进行数值预测。图 5.14使用上面给出的例句说明了集成成本的数值核算系统。11
In sentence 5.2a, when “sent” is reached, integration with its dependent “reporter” is relatively easy because the words are nearly adjacent in the sentence. In sentence 5.2b, however, the integration between “sent” and “reporter” (now the object of the verb) is more difficult, because it must cross an intervening noun phrase, “the photographer.”10 A strength of this theory is its ability to provide numerical predictions of the processing (storage plus integration) cost at each word in a sentence. Figure 5.14 illustrates the numerical accounting system for integration cost using the example sentences given above.11
DLT 的数值预测可以在阅读时间实验中根据经验进行测试,在该实验中,在计算机屏幕上查看句子的每个单词所花费的时间被量化。此类实验的假设是更长的阅读时间反映了更大的处理成本。DLT 得到了英语和其他语言句子处理实证研究的支持(例如,Warren & Gibson,2002 年;Grodner & Gibson,2005 年)。与当前目的相关的理论方面是这样的想法,即在精神上连接遥远的元素需要更多的资源。12
The numerical predictions of DLT can be empirically tested in reading time experiments in which the amount of time spent viewing each word of a sentence on a computer screen is quantified. The assumption of such experiments is that longer reading time is a reflection of greater processing cost. DLT has been supported by empirical research on sentence processing in English and other languages (e.g., Warren & Gibson, 2002; Grodner & Gibson, 2005). The relevant aspect of the theory for the current purpose is the idea that mentally connecting distant elements require more resources.12
DLT 提供了一种语言句法整合困难的解释,即由于输入词与其先前的从属词之间的距离。一个不同的理论观点表明,句法整合难度与单词在该点上与感知者句法期望的匹配程度有关。这种观点的基本假设是,在句子理解过程中的每个点,感知者对即将到来的词的句法类别都有特定的期望(Narayanan & Jurafsky, 1998, 2002; Hale, 2001; Levy, in press; cf. Lau et al. , 2006,神经数据)。这些期望反映了语法分析机制当前正在考虑的句子的结构分析。当遇到与最青睐的分析不匹配的单词时,必须重新分配资源,以改变首选的结构解释。这样的解释可以解释许多不同的句子处理效果,包括理解者遇到句法上意想不到的词的“花园小径”句子造成的困难,例如“经纪人被说服出售股票被发送”中的“to”入狱”(cf.第 5.4.2 节)
DLT provides one account of syntactic integration difficulty in language, namely, due to distance between an incoming word and its prior dependent word. A different theoretical perspective suggests that syntactic integration difficulty is associated with how well a word fits a perceiver’s syntactic expectations at that point. The underlying assumption of this view is that at each point during sentence comprehension, a perceiver has specific expectations for upcoming syntactic categories of words (Narayanan & Jurafsky, 1998, 2002; Hale, 2001; Levy, in press; cf. Lau et al., 2006, for neural data). These expectations reflect structural analyses of the sentence currently being considered by the parsing mechanism. When a word is encountered that does not match the most favored analysis, resources must be reallocated in order to change the preferred structural interpretation. Such an explanation can account for a number of different sentence processing effects, including difficulty caused by “garden-path” sentences in which a comprehender encounters a syntactically unexpected word, such as “to” in “The broker persuaded to sell the stock was sent to jail” (cf. section 5.4.2)
图 5.14 DLT 中的距离计算示例。依赖词之间的链接用曲线表示,与每个链接相关的距离用曲线下方的整数表示。每个词下方的数字显示该词与其先前的相关词的总距离。总距离用作该词的集成成本的度量。将集成成本与存储成本(未显示)相结合,得出每个单词的总处理成本,可以将其与阅读时间实验的经验数据进行比较。
Figure 5.14 Example of distance computation in DLT. Links between dependent words are shown by curved lines, and the distances associated with each link are shown by the integers below the curved line. The number below each word shows the total distance of that word from its prior dependent words. The total distance is used as a measure of the integration cost for that word. Combining integration costs with storage costs (not shown) yields a total processing cost for each word, which can be compared to empirical data from reading time experiments.
句法处理的预期理论有着悠久的历史(Marslen-Wilson,1975 年;从历史角度参见 Jurafsky,2003 年),但直到最近才开始使用心理语言学方法进行系统研究。值得注意的是,该方法可以成功解释 DLT 未预测到的句子处理效果。例如,Jaeger 等人。(2005, described in Levy, in press) 让参与者阅读嵌入关系从句大小不同的句子(下面用括号标记):
The expectancy theory of syntactic processing has old roots (Marslen-Wilson, 1975; cf. Jurafsky, 2003 for a historical perspective) but has only recently begun to be systematically investigated using psycholinguistic methods. Notably, the approach can successfully account for sentence processing effects not predicted by DLT. For example, Jaeger et al. (2005, described in Levy, in press) had participants read sentences that varied in the size of an embedded relative clause (marked by brackets below):
(5.3a) [教练在 8 点钟遇到的] 球员买了房子。. .
(5.3b) [教练在 8 点钟在河边遇见] 的球员买了房子。. .
(5.3c) [教练在 8 点钟在河边的体育馆附近遇到的球员] 买了房子。. .
(5.3a) The player [that the coach met at 8 o’clock] bought the house . . .(5.3b) The player [that the coach met by the river at 8 o’clock] bought the house . . .(5.3c) The player [that the coach met near the gym by the river at 8 o’clock] bought the house . . .
根据 DLT,随着中间关系从句的大小增加,动词“bought”应该更难与其先前的从属(“player”)整合。事实上,结果恰恰相反:随着关系从句的大小增加,“bought”的阅读时间变短。期望理论预测了这个结果,因为随着每个额外的修饰短语出现在关系从句中,一个主要动词(在本例中为“买”)变得更加期望。这反过来可能反映了语言统计方面的经验,其中短关系从句比长关系从句更常见(正如 Jaeger 等人对语料库统计数据的测量所证实的那样)。有关支持预期理论的其他研究,请参见 Konieczny (2000) 以及 Vasishth 和 Lewis (2006)。
According to DLT, the verb “bought” should be harder to integrate with its prior dependent (“player”) as the size of the intervening relative clause increases. In fact, precisely the opposite pattern of results was found: Reading times on “bought” became shorter as the size of the relative clause increased. Expectancy theory predicts this result, because as each additional modifying phrase occurs within the relative clause, a main verb (“bought,” in this case) becomes more expected. This in turn likely reflects experience with the statistics of language, in which short relative clauses are more common than long ones (as confirmed by Jaeger et al.’s measurements of corpus statistics). For other studies supporting expectancy theory, see Konieczny (2000), and Vasishth and Lewis (2006).
然而,值得注意的是,在某些情况下,DLT 做出的预测比预期理论更准确。例如,在上一小节的例句中,5.2b 句中遇到“sent”时,其语法范畴(名词)的期望值很高,但由于与“reporter”的距离较远,该词仍难以整合。因此,目前看来,分布式账本技术和期望理论在不同情况下都是成功的,这意味着需要开展工作来调和这两种理论。(有关寻求统一基于距离的影响和基于期望的影响的最新框架,请参见 Lewis 等人,2006 年。)然而,就当前目的而言,
Notably, however, there are cases in which DLT makes more accurate predictions than expectancy theory. For example, in the example sentences in the previous subsection, when “sent” is encountered in sentence 5.2b, its grammatical category (noun) is highly expected, yet the word is still difficult to integrate due to its distance from “reporter.” Hence at the current time, it appears that DLT and expectancy theory are successful in different circumstances, meaning that work is needed to reconcile these two theories. (For one recent framework that seeks to unify distance-based effects and expectancy-based effects, see Lewis et al., 2006.) For the current purposes, however, the relevant point is that both DLT and expectancy theory posit that difficult syntactic integrations consume processing resources used in building structural representations of sentences.
Lerdahl (2001) 的音高空间 (TPS) 理论涉及音乐环境中音高的感知。它建立在第 5.2.1 节中概述的关于音阶音调、和弦和键之间感知关系的经验发现的基础上,并在图 5.3、5.5 和 5.7 中进行了说明。用于表示这些关系的主要形式是一个“基本空间”,组织为音高字母的层次结构(基于 Deutsch & Feroe,1981)。图 5.15显示了 C 大调和弦在 C 大调上下文中的基本空间表示。
Lerdahl’s (2001) tonal pitch space (TPS) theory concerns the perception of pitch in a musical context. It builds on the empirical findings about the perceived relations between scale tones, chords, and keys outlined in section 5.2.1, and illustrated in Figures 5.3, 5.5, and 5.7. The main formalism used to represent these relations is a “basic space” organized as a hierarchy of pitch alphabets (based on Deutsch & Feroe, 1981). Figure 5.15 shows a basic-space representation of a C major chord in the context of the C major key.
正如 Lerdahl 所指出的那样,“空间的每一层都在下一个 [较低] 层详细阐述为不太稳定的音高等级;相反,一个级别上更稳定的音级继续到下一个 [更高] 级别。结构是不对称的,直接代表全音阶和三和弦”(第 48 页)。基本空间提供了一种计算序列中任意两个和弦之间心理距离的机制。计算距离的算法涉及测量一个和弦在基本空间中的表示必须移动多少才能将其转换为另一个和弦。该算法的详细信息超出了本书的范围(有关详细信息,请参阅 Lerdahl,2001 年);
As noted by Lerdahl, “Each level of the space elaborates into less stable pitch classes at the next [lower] level; conversely, the more stable pitch classes at one level continue on to the next [upper] level. The structure is asymmetrical and represents the diatonic scale and the triad directly” (p. 48). The basic space provides a mechanism for computing the psychological distance between any two musical chords in a sequence. The algorithm for computing distance involves measuring how much one has to shift a chord’s representation in the basic space to transform it into another chord. The details of this algorithm are beyond the scope of this book (see Lerdahl, 2001 for details); what is important is that the basic space provides an algebraic method for computing chord distances in a manner that incorporates the tripartite distances of pitch classes, chords, and keys, and yields a single distance value that can be expressed as an integer.
图 5.15 Lerdahl 的基本空间以包含多级音高结构的方式表示音乐和弦。面板 (A) 显示了 C 大调上下文中 C 大三和弦的表示,面板 (B) 以数字格式显示了相同内容。基本空间提供了一种机制,用于以反映音级、和弦和调的三方心理距离的方式来测量和弦之间的距离。来自 Lerdahl,2001 年。
Figure 5.15 Lerdahl’s basic space for representing musical chords in a manner that incorporates multiple levels of pitch structure. Panel (A) shows the representation of a C major triad in the context of the C major key, and panel (B) shows the same in a numerical format. The basic space provides a mechanism for measuring distances between chords in a manner that reflects the tripartite psychological distances of pitch classes, chords, and keys. From Lerdahl, 2001.
TPS 还提供了一种推导树结构的方法,如图 5.10 所示,它作为和弦之间感知依赖关系的假设。使用树结构,计算每个和弦与它在树中附加的和弦的距离,并附加规定一个和弦“继承”与其嵌入的和弦的距离。因此,每个和弦都与距另一个和弦的距离数值相关联。这个距离在预测音乐序列中张力的感知潮起潮落方面起着重要作用,其基本思想是张力随着和弦之间的音调距离而增加。例如,当从新的关键区域引入和弦时,张力会增加(参见 Steinbeis 等人,2006 年)。TPS 的数值预测可以与听众产生的张力曲线进行比较,听众对音乐段落中随着时间的推移感知到的张力进行评分(cf.第 5.2.2 节,第二小节)。此类实验为 TPS 提供了支持,并表明听众确实以分层而非纯粹顺序的方式听到和弦之间的关系(Lerdahl & Krumhansl,2007)。
TPS also provides a method for deriving tree structures such as that in Figure 5.10, which serve as a hypothesis for the perceived dependencies between chords. Using the tree structure, one computes the distance of each chord from the chord to which it attaches in the tree, with the added stipulation that a chord “inherits” distances from the chords under which it is embedded. Thus each chord is associated with a numerical distance value from another chord. This distance plays an important role in predicting the perceived ebb and flow of tension in musical sequences, with the basic idea being that tension increases with tonal distance between chords. For example, when a chord is introduced from a new key area, tension increases (cf. Steinbeis et al., 2006). The numerical predictions of TPS can be compared to tension profiles produced by listeners who rate perceived tension over time in musical passages (cf. section 5.2.2, second subsection). Such experiments have provided support for TPS, and suggest that listeners do in fact hear relations between chords in a hierarchical rather than a purely sequential manner (Lerdahl & Krumhansl, 2007).
然而,重要的是要注意,TPS 对于当前目的的基本特征不是它提出的树结构。这是因为不能简单地假设听众听到音乐中的长距离和声关系,因此最好将这种树结构视为接受经验检验的假设,如前所述(参见第 5.3.1 节)。相反,TPS 的基本特征是和弦关系是根据音级、和弦和键的结构化认知空间中的距离来感知的。即使以纯顺序方式听到和弦时,这种和声距离也适用,因此每个和弦的和声距离都是根据紧邻的前一个和弦计算的(参见第 5.2.2 节),第二小节)。正是这种基于距离的句法处理概念为语言处理提供了关键链接,这将在下一节中讨论。
It is important to note, however, that the essential feature of TPS for the current purposes is not the tree structures it proposes. This is because one cannot simply assume that listeners hear long-distance harmonic relations in music, so that such tree structures are best viewed as hypotheses subject to empirical test, as previously mentioned (cf. section 5.3.1). Instead, the essential feature of TPS is that chord relations are perceived in terms of distances in a structured cognitive space of pitch classes, chords, and keys. Such harmonic distances apply even when chords are heard in a purely sequential way, so that each chord’s harmonic distance is computed from the immediately preceding chord (cf. section 5.2.2, second subsection). It is this notion of distance-based syntactic processing that provides a key link to language processing, as discussed in the next section.
我们已经看到,当单词远离其依赖项或在句法上出乎意料时,它们很难在句法上整合到句子中。在这两种情况下,资源都被消耗为构建句子结构解释的一部分。在 DLT 中,远距离集成的成本很高,因为它们需要重新激活一个先前的相关词,该词的激活与词之间的距离成比例地衰减。在预期理论中,意想不到的句法类别代价高昂,因为它们需要改变句子的首选结构解释,这相当于提高了先前激活水平较低的结构的激活。换句话说,两种语言理论都假定困难的整合是由激活低激活项目引起的。
We have seen that words can be difficult to integrate syntactically into sentences when they are distant from their dependents or when they are syntactically unexpected. In both cases, resources are consumed as part of constructing the structural interpretation of a sentence. In DLT, distant integrations are costly because they require reactivating a prior dependent word whose activation has decayed in proportion to the distance between the words. In expectancy theory, unexpected syntactic categories are costly because they require changing the preferred structural interpretation of a sentence, which amounts to boosting the activation of a structure that previously had a low activation level. In other words, both language theories posit that difficult integrations arise from activating low-activation items.
在音乐中,就像在语言中一样,听众不断地参与构建对序列的结构解释,包括对地方调的感觉。13我假设,在建立键感时,和声上意外的音符或和弦会由于其与当前音乐上下文的音调距离(在 TPS 意义上)而产生处理成本。这种成本的产生是因为传入的音符或和弦在存储有关和弦关系信息的关联网络中具有较低的激活水平(参见 Bharucha,1987 年;Tillmann 等人,2000 年),但其表示必须快速且有选择地激活,以便使其与现有环境相结合。换句话说,由于需要激活低激活项,谐波距离转化为处理成本。(请注意,根据这个想法,和声上意外的音符或和弦恰恰是那些在和声上远离本地调的音符或和弦,因为听众倾向于期望来自本地调的和弦;cf. 施穆克勒,1989 年;休伦,2006 年。)
In music, as in language, a listener is continuously involved in building a structural interpretation of a sequence, including a sense of the local key.13 I hypothesize that when building a sense of key, a harmonically unexpected note or chord creates a processing cost due to its tonal distance (in the sense of TPS) from the current musical context. This cost arises because the incoming note or chord has a low activation level in the associative networks that store information about chord relationships (cf. Bharucha, 1987; Tillmann et al., 2000), yet its representation must be rapidly and selectively activated in order for it to be integrated with the existing context. In other words, harmonic distance translates into processing cost due to the need to activate a low-activation item. (Note that according to this idea, harmonically unexpected notes or chords are precisely those that are harmonically distant from the local key, because listeners tend to expect chords from the local key; cf. Schmuckler, 1989; Huron, 2006.)
因此,语言和音乐句法处理的重叠可以被认为是为困难的句法整合提供资源的神经区域和操作的重叠,这个想法被称为“共享句法整合资源假设”(SSIRH)。根据 SSIRH 的说法,为句法整合提供资源的大脑网络是“资源网络”,用于快速和有选择地将“表示网络”中的低激活项目带到整合发生所需的激活阈值(图 5.16) .
Overlap in the syntactic processing of language and music can thus be conceived of as overlap in the neural areas and operations that provide the resources for difficult syntactic integrations, an idea termed the “shared syntactic integration resource hypothesis” (SSIRH). According to the SSIRH, the brain networks providing the resources for syntactic integration are “resource networks” that serve to rapidly and selectively bring low-activation items in “representation networks” up to the activation threshold needed for integration to take place (Figure 5.16).
假设的语言和音乐重叠资源网络的神经位置是一个重要的问题,目前还没有确定的答案。与当前语言处理研究一致的一个想法是,它们位于额叶大脑区域,这些区域本身不包含句法表征,但为句法表征所在的后部区域的计算提供资源(Haarmann & Kolk,1991;Kaan & Swaab,2002) . 定义重叠的神经轨迹将需要使用定位大脑活动的技术(例如 fMRI)对语言和音乐进行受试者内部比较研究。例如,如果独立的语言和音乐任务被设计为具有两个不同层次的句法整合需求,语言和音乐(一种称为“认知结合”神经成像的技术;Price & Friston,1997)。这些地区将成为 SSIRH 提议的重叠资源网络的有力候选者。
The neural location of the hypothesized overlapping resource networks for language and music is an important question that does not yet have a firm answer. One idea consistent with current research on language processing is that they are in frontal brain regions that do not themselves contain syntactic representations but that provide resources for computations in posterior regions where syntactic representations reside (Haarmann & Kolk, 1991; Kaan & Swaab, 2002). Defining the neural locus of overlap will require within-subjects comparative studies of language and music using techniques that localize brain activity, such as fMRI. For example, if independent linguistic and musical tasks are designed with two distinct levels of syntactic integration demands within them, one could search for brain regions that show increased activation as a function of integration cost in both language and music (a technique known as “cognitive conjunction” neuroimaging; Price & Friston, 1997). These regions would be strong candidates for the overlapping resource networks proposed by the SSIRH.
图 5.16语言和音乐句法处理功能关系示意图。L = 语言,M = 音乐。该图表示语言和音乐句法表示存储在不同的大脑网络中的假设,而在为激活存储的句法表示提供神经资源的网络中存在重叠。箭头表示网络之间的功能连接。请注意,圆圈并不一定意味着高度集中的大脑区域。例如,语言和音乐表示网络可以扩展到多个大脑区域,或者作为功能隔离的网络存在于同一大脑区域中。
Figure 5.16 Schematic diagram of the functional relationship between linguistic and musical syntactic processing. L = language, M = music. The diagram represents the hypothesis that linguistic and musical syntactic representations are stored in distinct brain networks, whereas there is overlap in the networks which provide neural resources for the activation of stored syntactic representations. Arrows indicate functional connections between networks. Note that the circles do not necessarily imply highly focal brain areas. For example, linguistic and musical representation networks could extend across a number of brain regions, or exist as functionally segregated networks within the same brain regions.
SSIRH 的一个吸引力在于它可以调和本节前面描述的神经影像学和神经心理学之间的明显矛盾。关于神经影像学,SSIRH 与 Patel、Gibson 等人的发现一致。(1998) 假设 P600 反映了发生在后/颞脑区的句法整合过程 (cf. Kaan et al., 2000)。这也与发现音乐谐波处理激活额叶语言区域的定位研究一致(Koelsch 等人,2002 年;Tillmann 等人,2003 年),因为这些前位点包含用于激活后部表征的资源网络地区。(不过需要注意的是,重叠资源网络的精确定位需要比较语言和音乐的主体内设计,参见。多于。)
One appeal of the SSIRH is that it can reconcile the apparent contradiction between neuroimaging and neuropsychology described earlier in this section. With respect to neuroimaging, the SSIRH is consistent with the findings of Patel, Gibson, et al. (1998) under the assumption that the P600 reflects syntactic integration processes that take place in posterior/temporal brain regions (cf. Kaan et al., 2000). It is also consistent with localization studies that find that musical harmonic processing activates frontal language areas (Koelsch et al., 2002; Tillmann et al., 2003) under the view that these anterior loci house the resource networks that serve to activate representations in posterior regions. (It should be noted, however, that the precise localization of overlapping resource networks requires a within-subjects design comparing language and music, cf. above.)
关于神经心理学,SSIRH 提出,所报告的后天失乐症中音乐和语言句法处理之间的分离是由于音乐句法的特定领域表征受损(例如,谐波关系的长期知识),而不是句法整合过程。与这个想法一致,大多数此类病例与颞上回受损有关(Peretz,1993;Peretz 等,1994;Patel,Peretz 等,1998;Ayotte 等,2000),这很可能在谐波知识的长期表征中很重要。SSIRH 还提出,音乐音调性耳聋(先天性失乐症)中的音乐语言句法分离是由于形成音调认知表征的发育失败(Krumhansl,1990)。与这个想法一致,Peretz 和 Hyde(2003 年)以及 Foxton 等人的研究。(2004) 揭示了先天性失乐症在音高辨别和判断音高变化方向方面存在基本的心理生理缺陷。正如所讨论的第 4.5.2 节(“旋律轮廓耳聋假设”小节),这些问题可能会阻止这些人形成对音阶、和弦和键结构的正常认知表征。没有这样的表示,就没有音乐句法过程可以运行的基础。正如5.4.1 节中提到的,这就是为什么乐音耳聋在很大程度上与音乐-语言句法关系研究无关的原因。
With respect to neuropsychology, the SSIRH proposes that the reported dissociations between musical and linguistic syntactic processing in acquired amusia are due to damage to domain-specific representations of musical syntax (e.g., long-term knowledge of harmonic relations), rather than a problem with syntactic integration processes. Consistent with this idea, most such cases have been associated with damage to superior temporal gyri (Peretz, 1993; Peretz et al., 1994; Patel, Peretz, et al., 1998; Ayotte et al., 2000), which are likely to be important in the long-term representation of harmonic knowledge. The SSIRH also proposes that musico-linguistic syntactic dissociations in musical tone deafness (congenital amusia) are due to a developmental failure to form cognitive representations of musical pitch (Krumhansl, 1990). Consistent with this idea, research by Peretz and Hyde (2003) and Foxton et al. (2004) has revealed that congenital amusics have basic psychophysical deficits in pitch discrimination and in judging the direction of pitch changes. As discussed in section 4.5.2 (subsection “The Melodic Contour Deafness Hypothesis”), these problems likely prevent such individuals from forming normal cognitive representations of musical scale, chord, and key structure. Without such representations, there is no basis on which musical syntactic processes can operate. This is the reason why musical tone deafness is largely irrelevant to the study of music-language syntactic relations, as alluded to in section 5.4.1.
SSIRH 如何解释没有失语症的失语症报告,例如,导致严重语言障碍但音乐能力幸免的中风?如第 5.4.1 节所述,此类报告通常侧重于案例对具有非凡音乐能力的个人的研究,可能与更大人群的音乐处理无关。此外,大多数关于没有失乐症的失语症的报告已经严重过时了。例如,在 Marin 和 Perry(1999)对 13 个案例的回顾中,许多案例来自 1800 年代,最近的案例来自 1987 年。不用说,没有一个包含任何音乐句法处理的系统测试——例如,和弦的和声处理- 在具有明确语言句法处理缺陷的个体中。事实上,令人吃惊的是,尽管有较早的研究(Francès et al., 1973),但没有关于失语症谐波处理的现代研究。这是一个值得仔细研究的领域,并且是 SSIRH 的重要试验场,如下一节所述。
How can the SSIRH account for reports of aphasia without amusia, for example, a stroke that results in severe language impairment but spared musical abilities? As discussed in section 5.4.1, such reports often focus on case studies of individuals with extraordinary musical abilities, and may not be relevant to music processing in the larger population. Furthermore, most reports of aphasia without amusia are seriously out of date. For example, in Marin and Perry’s (1999) review of 13 cases, many cases were from the 1800s, and the most recent was from 1987. Needless to say, none contain any systematic tests of musical syntactic processing—e.g., harmonic processing of chords—in individuals with well-defined linguistic syntactic processing deficits. In fact, it is striking that there are no modern studies of harmonic processing in aphasia, despite suggestive older research (Francès et al., 1973). This is an area that merits careful study, and is a crucial testing ground for SSIRH, as discussed in the next section.
开发 SSIRH 的主要动机是生成预测以指导未来对语言和音乐句法处理关系的研究。一个显着的预测是关于音乐和语言句法处理的相互作用。特别是,由于 SSIRH 提出语言和音乐句法整合依赖于共同的神经资源,并且由于句法处理资源有限(Gibson,2000),它预测结合语言和音乐句法整合的任务将显示两者之间的干扰。特别是,SSIRH 预测整合遥远的和声元素将干扰语言中并发的困难句法整合。这个想法可以在调和和语言序列一起呈现的范式中进行测试,并研究调和结构对语言句法处理的影响。以下小节将讨论几项相关研究。
A principal motivation for developing the SSIRH was to generate predictions to guide future research into the relation of linguistic and musical syntactic processing. One salient prediction regards the interaction of musical and linguistic syntactic processing. In particular, because the SSIRH proposes that linguistic and musical syntactic integration rely on common neural resources, and because syntactic processing resources are limited (Gibson, 2000), it predicts that tasks that combine linguistic and musical syntactic integration will show interference between the two. In particular, the SSIRH predicts that integrating distant harmonic elements will interfere with concurrent difficult syntactic integration in language. This idea can be tested in paradigms in which a harmonic and a linguistic sequence are presented together and the influence of harmonic structure on syntactic processing in language is studied. Several relevant studies are discussed in the following subsection.
SSIRH 做出的第二个预测是关于失语症。几位语言研究人员认为,布罗卡失语症中的句法理解缺陷可能是由于激活和整合后语言区域的语言表征的过程中断,而不是这些表征本身受损(Kolk & Friederici, 1985; Haarmann & Kolk, 1991) ;Swaab、Brown 和 Hagoort,1998 年;Kaan 和 Swaab,2002 年)。对于这些失语症,SSIRH 预测语言中的句法理解缺陷将与音乐中的和声处理缺陷有关。下文第二小节讨论了相关证据。
A second prediction made by the SSIRH regards aphasia. Several language researchers have argued that syntactic comprehension deficits in Broca’s aphasia can be due to disruption of processes that activate and integrate linguistic representations in posterior language areas, rather than damage to these representations per se (Kolk & Friederici, 1985; Haarmann & Kolk, 1991; Swaab, Brown, & Hagoort, 1998; Kaan & Swaab, 2002). For these aphasics, the SSIRH predicts that syntactic comprehension deficits in language will be related to harmonic processing deficits in music. Relevant evidence is discussed in the second subsection below.
如果音乐和语言利用共同的有限资源池进行句法处理,那么我们应该观察并发之间的干扰困难的音乐和语言句法整合。测试这个预测需要将音乐和语言一起呈现的实验。尽管过去的许多研究都将语言和谐波处理配对,但它们主要集中在音乐和声处理与语言语义处理之间的关系上。这些研究值得注意的是,它们要么发现处理过程中缺乏交互作用(Besson 等人,1998 年;Bonnel 等人,2001 年),要么报告了可能由于与注意力有关的非特定因素引起的交互作用(Poulin -Charonnat 等人,2005 年)。接下来将简要回顾这些研究,作为 SSIRH 推动的研究的背景。
If music and language draw on a common pool of limited resources for syntactic processing, then one should observe interference between concurrent difficult musical and linguistic syntactic integrations. Testing this prediction requires experiments in which music and language are presented together. Although a number of past studies have paired linguistic and harmonic manipulations, they have largely focused on the relationship between musical harmonic processing and linguistic semantic processing. What is notable about these studies is that they either find a lack of interaction in processing (Besson et al., 1998; Bonnel et al., 2001) or report an interaction that is likely due to nonspecific factors having to do with attention (Poulin-Charonnat et al., 2005). These studies are briefly reviewed next as background for studies motivated by the SSIRH.
语言语义学和音乐句法之间相互作用的研究贝松等。(1998) 和 Bonnel 等人。(2001) 让参与者听唱句子,其中句子的最后一个词在语义上是正常的或异常的,并以调内或调外的音符演唱。贝松等。发现语言语义违规导致负向 ERP (N400),而跑调音符导致晚期正 ERP,并且一个简单的加法模型预测了语义/音乐句法违规组合的数据很好。邦内尔等。让听众执行单一任务(判断最后一个词或音符的不一致)或双重任务(判断两者的不一致),并发现与单一任务条件相比,双重任务不会导致绩效下降。因此,这两项研究都支持语言语义与音乐句法处理的独立性。
STUDIES EXAMINING THE INTERACTION BETWEEN LINGUISTIC SEMANTICS AND MUSICAL SYNTAX Besson et al. (1998) and Bonnel et al. (2001) had participants listen to sung sentences in which the final word of the sentence was either semantically normal or anomalous, and sung on an in-key or out-of-key note. Besson et al. found that the language semantic violations gave rise to a negative-going ERP (N400), whereas the out-of-key notes gave rise to a late positive ERP, and that a simple additive model predicted the data for combined semantic/music syntactic violations quite well. Bonnel et al. had listeners either perform a single task (judge incongruity of final word or note) or a dual task (judge incongruity of both), and found that the dual task did not result in a decrease in performance compared to the single-task conditions. Thus both studies supported the independence of linguistic semantic versus musical syntactic processing.
Poulin-Charronnat 等人。相比之下,(2005) 确实发现了音乐句法处理和语言语义处理之间的相互作用。他们采用和声启动范例,使用第 5.2.3 节中介绍的类型的和弦序列(参见声音示例 5.2)。回想一下,这些序列的构造是为了让前六个和弦建立一个音乐背景,该音乐背景决定了最后两个和弦的和声功能。这两个和弦在两种情况下在物理上是相同的,在一种情况下形成 VI 进行(完美的节奏),但在另一种情况下形成 I-IV 进行。这会导致前者产生一种封闭感,而后者则不然。
Poulin-Charronnat et al. (2005), in contrast, did find an interaction between musical syntactic processing and linguistic semantic processing. They employed the harmonic priming paradigm, using chord sequences of the type introduced in section 5.2.3 (cf. Sound Example 5.2). Recall that these sequences are constructed so that the first six chords establish a musical context that determines the harmonic function of the last two chords. These two chords, which are physically identical in both contexts, form a V-I progression (a perfect cadence) in one context but a I-IV progression in the other. This leads to a sense of closure in the former context but not the latter.
为了将音乐和语言结合起来,每个和弦一个音节演唱和弦序列(四声部和声)。这些词形成了句子,其中最后一个词在语义上是预期的或意外的。例如,“The giraffe had a very long .”这句话。. ” 可以以“脖子”(预期的结局)或“脚”(意想不到的结局)结尾。参与者被要求听每个序列并决定最后一个词是真实的词还是无意义的词(在一半的情况下,最后一个词实际上是无意义的词,例如“sneck”)。兴趣的焦点是对真实单词的反应时间。基于标准的心理语言学研究,预测语义预期词与意料之外词的 RT 更快,反映了语义启动。问题感兴趣的是这种语义启动效应是否会被最终和弦的和声函数调制(I 与 IV,即主音与次主导)。确实,这就是所发现的。特别是,当最终和弦作为次属和弦时,预期词和非预期词之间的 RT 差异会减小。即使对于没有受过音乐训练的参与者也能获得这一结果,这表明即使在非音乐家,音乐和语言处理也会相互作用。
To combine music and language, the chord sequences were sung (in four-part harmony) with one syllable per chord. The words formed sentences in which the final word was semantically either expected or unexpected. For example, the sentence “The giraffe had a very long . . .” could end either with “neck” (the expected ending) or “foot” (an unexpected ending). Participants were asked to listen to each sequence and decide if the last word was a real word or a nonsense word (in half the cases, the last word was in fact a nonsense word such as “sneck”). The focus of interest was on reaction times to real words. Based on standard psycholinguistic research, a faster RT was predicted for semantically expected versus unexpected words, reflecting semantic priming. The question of interest was whether this semantic priming effect would be modulated by the harmonic function of the final chord (I vs. IV, i.e., tonic vs. subdominant). Indeed, this is what was found. In particular, the RT difference between the expected and unexpected word was diminished when the final chord functioned as a subdominant chord. This result was obtained even for participants without musical training, showing that musical and linguistic processing interacted even in nonmusicians.
这项研究表明,音乐句法操作影响语言语义处理。然而,作者认为这种效应可能是由一般注意机制介导的,而不是由语言和音乐之间共享的处理资源介导的。具体来说,他们提出和声操作会影响语言语义,因为不同的和弦结尾会以不同的方式影响听众对最终单词/和弦的注意力。例如,在 IV 和弦上结束和弦序列可能会引起对音乐的注意(因为它听起来不完整),从而分散对语言处理的注意力。这种解释得到了 Escoffier 和 Tillmann(2006 年)的一项研究的支持,他们将和声启动和弦序列与几何视觉模式而不是文字(每个和弦一个模式)相结合,并表明以 IV 和弦结尾(相对于 I 和弦)会减慢序列末尾目标模式的处理速度。因此,在 IV 和弦上结束序列似乎对响应各种刺激的速度具有非特异性影响(参见 Bigand 等人,2001)。
This study showed that a musical syntactic manipulation influenced linguistic semantic processing. However, the authors suggested that this effect might be mediated by general attentional mechanisms rather than by shared processing resources between language and music. Specifically, they proposed that the harmonic manipulation influenced linguistic semantics because the different chord endings affected the listeners’ attention to the final word/chord in different ways. For example, ending a chord sequence on a IV chord might draw attention to the music (because it sounds incomplete) and thus distract from language processing. This interpretation is supported by a study by Escoffier and Tillmann (2006), who combined harmonic-priming chord sequences with geometric visual patterns rather than with words (one pattern per chord), and showed that ending on a IV chord (vs. a I chord) slowed the speed of processing of the target pattern at the end of the sequence. Thus ending a sequence on a IV chord appears to have a nonspecific influence on the speed of responding to various kinds of stimuli (cf. Bigand et al., 2001).
Poulin-Charronnat 等人的研究。提出了音乐语言句法交互工作的一个重要问题。具体而言,此类工作应控制由于一般注意机制(例如,通过在音乐中使用非谐波但引起注意的听觉操作)引起的音乐对语言的间接影响。
The study of Poulin-Charronnat et al. raises an important issue for work on music-language syntactic interactions. Specifically, such work should control for indirect effects of music on language due to general attentional mechanisms (e.g., via the use of nonharmonic but attention-getting auditory manipulations in music).
检验语言和音乐句法之间相互作用的研究 我们现在转向由 SSIRH 推动的研究,这些研究将音乐和声操作与语言句法操作相结合。目前存在三项此类研究,一项神经研究(Koelsch 等人,2005 年)和两项行为研究(Fedorenko 等人,2009 年;Slevc 等人,2009 年)。
STUDIES EXAMINING THE INTERACTION BETWEEN LINGUISTIC AND MUSICAL SYNTAX We now turn to studies motivated by the SSIRH that combine musical harmonic manipulations with linguistic syntactic manipulations. Currently three such studies exist, one neural study (by Koelsch et al., 2005) and two behavioral studies (Fedorenko et al., 2009; Slevc et al., 2009).
科尔施等人。(2005) 进行了一项 ERP 研究,其中短句与音乐和弦同时呈现,每个单词一个和弦(单词以每秒约 2 个单词的速度在视觉上连续呈现)。在某些句子中,最后一个词因性别分歧而造成语法违规。(这些句子是用德语写的,其中许多名词都标有性别。这项研究中使用的性别违规的一个例子是:Er trinkt den kühlen Bier,“他喝男性酷男性啤酒中性。)最后这个词违反了语言中的句法期望(参见第 5.4.3 节),关于语言期望理论的小节)。和弦序列旨在强烈调用特定键,最终和弦可以是主音该调的和弦或来自远距离调的(和声意外的)调外和弦(即,C 大调序列末尾的降 D 大调和弦)。14参与者(所有非音乐家)被指示忽略音乐,只判断句子的最后一个词在语言上是否正确。
Koelsch et al. (2005) conducted an ERP study in which short sentences were presented simultaneously with musical chords, with one chord per word (words were presented visually and in succession, at a rate of about 2 words per second). In some sentences, the final word created a grammatical violation via a gender disagreement. (The sentences were in German, in which many nouns are marked for gender. An example of a gender violation used in this study is: Er trinkt den kühlen Bier, “He drinks themasculine coolmasculine beerneuter.) This final word violates a syntactic expectancy in language (cf. section 5.4.3, subsection on expectancy theory in language). The chord sequences were designed to strongly invoke a particular key, and the final chord could be either the tonic chord of that key or a (harmonically unexpected) out-of-key chord from a distant key (i.e., a D-flat major chord at the end of a C major sequence).14 The participants (all nonmusicians) were instructed to ignore the music and simply judge if the last word of the sentence was linguistically correct.
科尔施等人。专注于由句法上不一致的单词和和弦引起的早期 ERP 负面影响。先前仅针对语言或音乐的研究表明,语言句法不协调与左前否定性 (LAN) 有关,而音乐不协调与早期右前否定性有关 (ERAN; Gunter et al., 2000; Koelsch et al ., 2000 年;弗里德里希 (Friederici),2002 年)。(请注意,LAN 的偏侧化程度比 ERAN 强。虽然 ERAN 在右前半球最强,但在左前半球可以清楚地观察到。)对于他们的组合语言-音乐刺激,Koelsch 等人阿尔。发现当句子按语法结束但有一个跑调的和弦时,就会产生一个正常的 ERAN。相似地,当和弦序列正常结束但伴随着句法不一致的词时,就会产生一个正常的 LAN。感兴趣的问题是当一个序列在语言和音乐中同时出现句法不一致时,这些大脑反应将如何相互作用。
Koelsch et al. focused on early ERP negativities elicited by syntactically incongruous words and chords. Previous research on language or music alone had shown that the linguistic syntactic incongruities were associated with a left anterior negativity (LAN), whereas the musical incongruities were associated with an early right anterior negativity (ERAN; Gunter et al., 2000; Koelsch et al., 2000; Friederici, 2002). (Note that the degree of lateralization is stronger for the LAN than for the ERAN. While the ERAN is strongest over the right anterior hemisphere, it can be clearly observed over the left anterior hemisphere.) For their combined language-music stimuli, Koelsch et al. found that when sentences ended grammatically but with an out-of-key chord, a normal ERAN was produced. Similarly, when chord sequences ended normally but were accompanied by a syntactically incongruous word, a normal LAN was produced. The question of interest was how these brain responses would interact when a sequence had simultaneous syntactic incongruities in language and music.
关键发现是大脑反应不仅仅是累加的。取而代之的是一种相互作用:当这些词伴随着跑调的和弦时,在句法上不一致的词的 LAN 明显变小,就好像 LAN 和 ERAN 的底层过程在争夺相似的神经资源一样。在一般注意力效应的对照实验中,Koelsch 等人。表明 LAN 不受简单的听觉古怪范例的影响,该范例涉及句子最后一个单词的物理异常音调。因此,该研究支持预测,结合语言和音乐句法整合的任务将显示这两个过程之间的干扰。
The key finding was that the brain responses were not simply additive. Instead, there was an interaction: The LAN to syntactically incongruous words was significantly smaller when these words were accompanied by an out-of-key chord, as if the processes underlying the LAN and ERAN were competing for similar neural resources. In a control experiment for general attentional effects, Koelsch et al. showed that the LAN was not influenced by a simple auditory oddball paradigm involving physically deviant tones on the last word in a sentence. Thus the study supports the prediction that tasks that combine linguistic and musical syntactic integration will show interference between the two processes.
转向 Fedorenko 等人的行为研究。(2009),这些研究人员使用演唱的句子将音乐和语言结合起来。语言句法整合难度是通过相关词之间的距离来控制的。在下面的句子 5.4a 中,关系从句“that met the spy”仅包含局部整合(参见第 5.4.3节,依赖局部理论小节)。在句子 5.4b 中,关系从句“that the spy met”包含“met”和“that”的非局部整合,众所周知这更难处理(例如,King & Just, 1991)。
Turning to the behavioral study of Fedorenko et al. (2009), these researchers combined music and language using sung sentences. Linguistic syntactic integration difficulty was manipulated via the distance between dependent words. In sentence 5.4a below, the relative clause “that met the spy” contains only local integrations (cf. section 5.4.3, subsection on dependency locality theory). In sentence 5.4b, the relative clause “that the spy met” contains a nonlocal integration of “met” with “that,” known to be more difficult to process (e.g., King & Just, 1991).
(5.4a) 遇到间谍的警察写了一本关于此案的书。(5.4b) 间谍遇到
的警察写了一本关于此案的书。
(5.4a) The cop that met the spy wrote a book about the case.(5.4b) The cop that the spy met wrote a book about the case.
这些句子是按照旋律演唱的,旋律包含或不包含关系从句最后一个单词的走调音符(上面有下划线)。句子中的所有单词都是单音节的,因此每个单词对应一个音符。一个控制条件被包含在一个引人注目但不和谐的音乐事件中:在关系从句的最后一个词上音量增加 10 dB。在每个句子之后,参与者被问到一个理解问题,并假设准确性反映了处理难度。
The sentences were sung to melodies that did or did not contain an out-of-key note on the last word of the relative clause (underlined above). All the words in the sentences were monosyllabic, so that each word corresponded to one note. A control condition was included for an attention-getting but nonharmonically deviant musical event: a 10 dB increase in volume on the last word of the relative clause. After each sentence, participants were asked a comprehension question, and accuracy was assumed to reflect processing difficulty.
结果揭示了音乐和语言处理之间的相互作用:对于具有远距离句法整合和局部句法整合的句子(如预期),理解准确性较低,但至关重要的是,当旋律包含跑调音符时,这种差异更大。控制条件(响亮的音符)没有产生这种效果:两种句子类型之间的差异与不包含跑调音符的条件下的差异大小相同。这些结果表明,语言和音乐结构整合的某些方面依赖于共享处理资源。
The results revealed an interaction between musical and linguistic processing: Comprehension accuracy was lower for sentences with distant versus local syntactic integrations (as expected), but crucially, this difference was larger when melodies contained an out-of-key note. The control condition (loud note) did not produce this effect: The difference between the two sentence types was of the same size as that in the conditions that did not contain an out-of-key note. These results suggest that some aspect of structural integration in language and music relies on shared processing resources.
Slevc 等人在此描述的最终研究。(2009),通过结构期望操纵语言句法整合困难(参见第 5.4.3 节),关于语言期望理论的小节),还直接比较了音乐和声操作对语言句法与语义处理的影响。在这项研究中,参与者在电脑屏幕上逐句阅读句子。他们通过按下一个按钮来获得下一个短语来控制短语的时间。在此类研究中,假定查看短语所花费的时间反映了与该短语相关的处理难度。这种“自定进度的阅读”范式在心理语言学研究中得到了广泛应用。Slevc 等人研究的新颖之处在于,每个乐句都伴有一个和弦,这样整个句子就形成了连贯的、巴赫风格的和弦进行。
The final study described here, by Slevc et al. (2009), manipulated linguistic syntactic integration difficulty via structural expectancy (cf. section 5.4.3, subsection on expectancy theory in language), and also directly compared the influence of musical harmonic manipulations on linguistic syntactic versus semantic processing. In this study, participants read sentences phrase by phrase on a computer screen. They controlled the timing of phrases by pushing a button to get the next phrase. In such studies, the amount of time spent viewing a phrase is assumed the reflect the amount of processing difficulty associated with that phrase. This “self-paced reading” paradigm has been much used in psycholinguistic research. The novel aspect of Slevc et al.’s study was that each phrase was accompanied by a chord so that the entire sentence made a coherent, Bach-style chord progression.
这些句子包含语言句法或语义操作。在句法操纵中,像 5.5a 这样的句子包含一个完整的或简化的句子补语从句,通过包含或省略单词“that”来实现。例如,在句子 5.5a 中,省略“that”会导致缩减补语从句“the hypothesis was being studyed in his lab”。(注:下面 5.5a 和 5.5b 中的竖斜线表示自定进度阅读实验中使用的个别短语。)在这种情况下,当读者第一次遇到“假设”这个短语时,他们倾向于将其解释为直接的当遇到“was”时,“confirmed”的宾语会导致句法整合困难,因为这表明“the hypothesis”实际上是嵌入从句的主语。换一种说法,由于违反句法预期,省略“that”会产生具有局部处理困难(在“was”上)的“garden path”句子。在语义操纵中,像 5.5b 这样的句子包含语义一致或异常的词,从而确认或违反语义期望。在关键词(下面带下划线)期间播放的和弦是调内和声或调外的。(调外和弦是从离乐句调的五度音圈 3-5 步远的调上画出来的。)因为调外的和弦在和声上是出乎意料的,所以该实验将语言中的句法或语义期望与音乐中的和声期望进行了交叉。感兴趣的因变量是关键词的阅读时间。
The sentences contained either a linguistic syntactic or semantic manipulation. In the syntactic manipulation, sentences like 5.5a included either a full or reduced sentence complement clause, achieved by including or omitting the word “that.” In sentence 5.5a, for example, omitting “that” results in the reduced complement clause “the hypothesis was being studied in his lab.” (Note: the vertical slashes in 5.5a and 5.5b below indicate the individual phrases used in the self-paced reading experiment.) In this case, when readers first encounter the phrase “the hypothesis”, they tend to interpret it as the direct object of “confirmed,” which causes syntactic integration difficulty when “was” is encountered, as this signals that “the hypothesis” is actually the subject of an embedded clause. In other words, the omission of “that” creates a “garden path” sentence with localized processing difficulty (on “was”) due to violation of a syntactic expectancy. In the semantic manipulation, sentences like 5.5b included either a semantically consistent or anomalous word, thereby confirming or violating a semantic expectancy. The chord played during the critical word (underlined below) was either harmonically in-key or out-of-key. (Out-of-key chords were drawn from keys 3-5 steps away on the circle of fifths from the key of the phrase.) Because out-of-key chords are harmonically unexpected, the experiment crossed syntactic or semantic expectancy in language with harmonic expectancy in music. The dependent variable of interest was the reading time for the critical word.
(5.5a) 科学家 | 穿着| 厚眼镜 | 确认(那个) | 假设 | 是 | 正在 | 学习 | 在他的实验室里。
(5.5b) 老板 | 警告 | 邮递员 | 观看 | 生气 | 狗/猪 | 什么时候 | 交付 | 邮件。
(5.5a) The scientist | wearing | thick glasses | confirmed (that) | the hypothesis | was | being | studied | in his lab.(5.5b) The boss | warned | the mailman | to watch | for angry | dogs/pigs | when | delivering | the mail.
主要发现是语言操作类型(句法或语义)、语言期望和音乐期望之间存在显着的三向交互作用。也就是说,句法和语义上意想不到的词比预期的词读得慢;同时出现的跑调和弦会导致句法上意想不到的词的速度大幅下降,但语义上意想不到的词不会。因此,处理一个和声意想不到的和弦会干扰语言中句法而非语义关系的处理。这些结果再一次支持了神经资源在语言和音乐句法处理之间共享的说法。
The main finding was a significant three-way interaction between linguistic manipulation type (syntactic or semantic), linguistic expectancy, and musical expectancy. That is, syntactically and semantically unexpected words were read more slowly than their expected counterparts; a simultaneous out-of-key chord caused substantial additional slowdown for syntactically unexpected words, but not for semantically unexpected words. Thus, processing a harmonically unexpected chord interfered with the processing of syntactic, but not semantic, relations in language. Once again, these results support the claim that neural resources are shared between linguistic and musical syntactic processing.
综上所述,上述三项研究指出了语言和音乐句法处理背后的共享神经资源。他们还建议,研究语言和音乐的并发处理(迄今为止相对较少)是探索这两个领域的模块化问题的有前途的领域。
Taken together, the three studies reviewed above point to shared neural resources underlying linguistic and musical syntactic processing. They also suggest that studies examining concurrent processing of language and music, which have been relatively rare to date, are a promising area for exploring issues of modularity in both domains.
值得注意的是,在现代认知神经科学中,几乎没有关于失语症的音乐句法处理的研究。这尤其引人注目,因为 Francès 等人的一项早期研究。(1973) 提出,患有语言理解障碍的失语症患者在音乐调性感知方面也存在缺陷。研究人员对一大群失语症患者进行了研究,让他们判断两个短的、同步的旋律是相同还是不同。旋律要么是调性的,要么是无调性的。在这种情况下,正常参与者(即使是那些没有受过音乐训练的人)在音调刺激上表现出色。失语症未能显示出这种音调优势效应,这导致作者认为对音调的感知“似乎与语言区域中存在的预先建立的回路有关”(第 133 页)。
Remarkably, there has been virtually no work on musical syntactic processing in aphasia in modern cognitive neuroscience. This is particularly striking because an early study by Francès et al. (1973) suggested that aphasic individuals with linguistic comprehension disorders also have a deficit in the perception of musical tonality. The researchers studied a large group of aphasics and had them judge whether two short, isochronous melodies were the same or different. The melodies were either tonal or atonal. Under these circumstances, normal participants (even those with no musical training) show superior performance on the tonal stimuli. Aphasics failed to show this tonal superiority effect, leading the authors to suggest that the perception of tonality “seems to engage pre-established circuits existing in the language area” (p. 133).
这个想法已经搁置了几十年,没有进一步研究失语症的音调感知。为什么会这样?测试失语症语言理解能力和探测音调关系感知的好工具早已可用,但没有人试图复制或扩展这些结果。Francès 等人的研究结果使这更加令人费解。是方法论问题有些笼罩,自然需要进一步的工作(参见 Peretz,1993)。很可能缺乏关于该主题的研究反映了对失语症和失乐症之间分离的强调(参见第 5.4.1 节)。然而,鉴于第 5.4.1 节中提出的有关此类分离的警告以及 SSIRH 的预测,显然是时候重新审视这个问题了。
This idea has lain fallow for decades, with no further studies of tonality perception in aphasia. Why might this be? Good tools for testing linguistic comprehension in aphasia and for probing the perception of tonal relations have long been available, yet no one has attempted to replicate or extend these results. This is made even more puzzling by the fact that the findings of Francès et al. were somewhat clouded by methodological issues, and naturally called for further work (cf. Peretz, 1993). It is likely that the absence of research on this topic reflects the emphasis on dissociations between aphasia and amusia (cf. section 5.4.1). However, given the caveats about such dissociations raised in section 5.4.1, and the predictions of SSIRH, it is clearly time to revisit this issue.
Patel、Iversen、Wassenaar 和 Hagoort(2008 年)最近研究了 12 名布罗卡失语症患者(他们都不是专业音乐家)的音乐和语言句法处理。布罗卡失语症是一种失语症,其中个体在造句方面有明显困难,尽管他们的言语理解能力通常看起来相当不错。事实上,仔细的测试往往会揭示语言句法理解的缺陷。为了检查我们研究的失语症患者是否有这样的缺陷,我们采用了标准的心理语言学测试来进行句法理解。这种“句图匹配任务”包括一次听一个句子,然后在一张有四张不同图片的纸上指向相应的图片。句子在句法复杂性的五个级别上各不相同。例如,图 5.17)。
Patel, Iversen, Wassenaar, and Hagoort (2008) recently examined musical and linguistic syntactic processing in a population of 12 Broca’s aphasics (none of whom had been a professional musician). Broca’s aphasia is a type of aphasia in which individuals have marked difficulty with sentence production, though their speech comprehension often seems quite good. In fact, careful testing often reveals linguistic syntactic comprehension deficits. To check whether the aphasics we studied had such deficits, we employed a standard psycholinguistic test for syntactic comprehension. This “sentence-picture matching task” involves listening to one sentence at a time and then pointing to the corresponding picture on a sheet with four different pictures. Sentences varied across five levels of syntactic complexity. For example, a sentence with an intermediate level of complexity (level 3) was the passive structure: “The girl on the chair is greeted by the man” (Figure 5.17).
图 5.17句子图片匹配任务的示例面板:“The girl on the chair is greeted by the man.”
Figure 5.17 Example panel from the sentence-picture matching task for the sentence: “The girl on the chair is greeted by the man.”
在这些句子中确定谁对谁做了什么取决于句法信息(例如,简单的词序试探法,如“第一个名词 = 代理人”不起作用)。失语症患者在这项测试中的表现明显低于对照组,这表明他们确实存在语言句法理解缺陷。因此,他们是研究语言和音乐句法缺陷之间关系的合适人群。
Determining who did what to whom in such sentences relies on syntactic information (e.g., simple word-order heuristics such as “first noun = agent” do not work). The aphasics performed significantly worse than controls on this test, which established that they did indeed have a syntactic comprehension deficit in language. They were therefore an appropriate population for studying relations between linguistic and musical syntactic deficits.
为了以可比较的方式测试音乐和语言,我们让失语症患者(和匹配的对照)对音乐和语言序列进行可接受性判断。语言序列是句子(n = 120):一半包含句法或语义错误。例如,句子“The sailors call for the captain and demands a fine bottle of rum”包含句法一致错误,而“Anne scratched her name with her tomato on the wooden door”在语义上是异常的。我们测试了句法和语义,以确定音乐句法能力是否与语言句法特别相关。音乐序列是和弦序列 ( n= 60): 一半包含跑调的和弦,违反了乐句的音乐句法(和声)。因此,音乐任务可与酸音检测任务相媲美,尽管它使用和弦而不是单音旋律。(和弦序列取自 Patel、Gibson 等人 1998 年的 ERP 研究,代表“键内”和“远键”条件——有关这些刺激的背景,请参见第 5.4.2 节,以及图5.12为例。)我们还让失语症患者和对照组做了一个实验,涉及对短旋律的相同/不同辨别,以检查他们是否对音乐材料有任何听觉短期记忆问题。
To test music and language in a comparable fashion, we had the aphasics (and matched controls) perform acceptability judgments on musical and linguistic sequences. The linguistic sequences were sentences (n = 120): Half contained either a syntactic or a semantic error. For example, the sentence “The sailors call for the captain and demands a fine bottle of rum” contains a syntactic agreement error, whereas “Anne scratched her name with her tomato on the wooden door” is semantically anomalous. We tested both syntax and semantics in order to determine if musical syntactic abilities were specifically related to linguistic syntax. The musical sequences were chord sequences (n = 60): Half contained an out-of-key chord, violating the musical syntax (harmony) of the phrase. The musical task was thus comparable to a sour-note detection task, though it used chords instead of a melody of single tones. (The chord sequences were taken from Patel, Gibson, et al.’s 1998 ERP study, and represented the “in-key” and “distant-key” conditions—cf. section 5.4.2 for background on these stimuli, and Figure 5.12 for an example.) We also had the aphasics and controls do an experiment involving same/different discrimination of short melodies, to check if they had any auditory short-term memory problems for musical material.
所有失语症患者都有左半球病变,但病变部位各不相同,并不总是包括布罗卡区。这种可变性在 Broca 失语症的研究中众所周知(Willmes & Poeck,1993;Caplan 等,1996),并且使我们无法解决定位问题。我们转而关注基于两个领域任务表现的音乐和语言之间的认知关系。
All aphasics had left-hemisphere lesions, though the locations were variable and did not always include Broca’s area. Such variability is well known from studies of Broca’s aphasia (Willmes & Poeck, 1993; Caplan et al., 1996) and precluded us from addressing issues of localization. We focused instead on cognitive relations between music and language based on performance on tasks in both domains.
两名失语症患者和一名对照者在旋律相同/不同的任务上表现不佳,被排除在进一步分析之外;其余的失语症患者和对照组在旋律任务上的表现没有差异,表明这些组在对音调序列的基本感知上是匹配的。转向主要结果,主要的兴趣发现是失语症患者在检测和弦序列中的和声异常方面的表现明显差于对照,这表明音乐调性处理存在缺陷(图 5.18)。
Two aphasics and one control performed poorly on the melodic same/different task, and were excluded from further analysis; the remaining aphasics and controls did not differ in their performance on the melodic task, indicating that the groups were matched on basic perception of tone sequences. Turning to the main results, the primary finding of interest was that the aphasics performed significantly worse than controls on detecting harmonic anomalies in chord sequences, indicating a deficit in the processing of musical tonality (Figure 5.18).
他们在语言句法任务上也表现出严重的缺陷,在语言语义任务上也有缺陷,尽管这只是逃避了统计意义。图 5.19以不同的方式显示数据,允许将个人在音乐任务上的表现与他们在两种语言任务上的表现进行比较。
They also showed a severe deficit on the linguistic syntactic task, and an impairment on the linguistic semantic task, though this just escaped statistical significance. Figure 5.19 shows the data in a different way, permitting the performance of individuals on the music task to be compared to their performance on the two language tasks.
图 5.18失语症的表现和对音乐和语言任务的控制。垂直轴显示在检测谐波、语言句法和语言语义异常时命中百分比减去误报百分比。错误条显示 1 个标准错误。
Figure 5.18 Performance of aphasics and controls on musical and linguistic tasks. The vertical axis shows percentage of hits minus percentage of false alarms in detecting harmonic, linguistic syntactic, and linguistic semantic anomalies. Error bars show 1 standard error.
可以看出,在音乐任务上失语症和控制之间有很多重叠,表明语言语法失调与该组中相对轻度的音调感知障碍有关。事实上,大多数失语症患者在音乐句法测试中的得分都在正常范围内这一事实引发了一个问题。观察到的失语症群体在音调感知方面的缺陷是否仅仅是由于少数人的病变影响了涉及语言和音乐的不同大脑区域(因此他们在这两个领域的得分都很低)?或者,作为一个群体,失语症患者的较低表现是否表明他们对音调的感知存在某种系统性退化,这与语言语法失调有关?解决这个问题的一种方法是查看音乐任务和语言句法任务的表现之间的相关性。对于失语症患者,简单的相关性并不显着,但有趣的是,当控制被包含在相关性中时(通过多元回归分析),音乐句法任务的表现是语言句法任务表现的重要预测指标。这指向语言和音乐语法的一些共同过程,该过程在控制和失语症中都起作用,尽管在失语症中处于退化水平。值得注意的是,当对音乐句法与语言语义进行相同类型的多元回归分析时,音乐任务的表现并不能预测语言表现。因此,假定的共享过程似乎将音乐句法与语言句法联系起来,而不是与语言语义联系起来。当控件包含在相关性中时(通过多元回归分析),音乐句法任务的表现是语言句法任务表现的重要预测指标。这指向语言和音乐语法的一些共同过程,该过程在控制和失语症中都起作用,尽管在失语症中处于退化水平。值得注意的是,当对音乐句法与语言语义进行相同类型的多元回归分析时,音乐任务的表现并不能预测语言表现。因此,假定的共享过程似乎将音乐句法与语言句法联系起来,而不是与语言语义联系起来。当控件包含在相关性中时(通过多元回归分析),音乐句法任务的表现是语言句法任务表现的重要预测指标。这指向语言和音乐语法的一些共同过程,该过程在控制和失语症中都起作用,尽管在失语症中处于退化水平。值得注意的是,当对音乐句法与语言语义进行相同类型的多元回归分析时,音乐任务的表现并不能预测语言表现。因此,假定的共享过程似乎将音乐句法与语言句法联系起来,而不是与语言语义联系起来。这指向语言和音乐语法的一些共同过程,该过程在控制和失语症中都起作用,尽管在失语症中处于退化水平。值得注意的是,当对音乐句法与语言语义进行相同类型的多元回归分析时,音乐任务的表现并不能预测语言表现。因此,假定的共享过程似乎将音乐句法与语言句法联系起来,而不是与语言语义联系起来。这指向语言和音乐语法的一些共同过程,该过程在控制和失语症中都起作用,尽管在失语症中处于退化水平。值得注意的是,当对音乐句法与语言语义进行相同类型的多元回归分析时,音乐任务的表现并不能预测语言表现。因此,假定的共享过程似乎将音乐句法与语言句法联系起来,而不是与语言语义联系起来。
As can be seen, there is a good deal of overlap between aphasics and controls on the music task, suggesting that linguistic agrammatism was associated with a relatively mild impairment of tonality perception in this group. Indeed, the fact that most aphasics score within the normal range on the music syntax test raises a question. Is the observed aphasic group deficit in tonality perception simply due to a few individuals with lesions that affect separate brain areas involved in language and music (and hence who score poorly in both domains)? Or is the lower performance of the aphasics as a group indicative of some systematic degradation in their perception of tonality, related to linguistic agrammatism? One way to address this question is to look at the correlation between performance on the music task and the language syntax task. For the aphasics, the simple correlation was not significant, but interestingly, when the controls were included in the correlation (via a multiple regression analysis), performance on the music syntax task was a significant predictor of performance on the language syntactic task. This points to some process common to language and music syntax that operates in both the controls and the aphasics, though at a degraded level in the aphasics. Notably, when the same type of multiple regression analysis was conducted on music syntax versus language semantics, performance on the music task did not predict linguistic performance. Hence the putative shared process appears to link music syntax to language syntax rather than to language semantics.
尽管上述研究采用了明确的音调判断,但使用隐性任务测试音乐句法能力也很重要。这是因为 Tillmann (2005) 的研究表明,在外显任务中存在音乐句法缺陷的个体仍然可以表现出对音乐句法知识的内隐访问。我们用来挖掘隐式句法能力的任务是谐波启动 (Bharucha & Stoeckig, 1986)。和声启动是音乐认知中经过充分研究的范例,它测试了前面的和声背景对目标和弦处理的影响(参见第5.2.3和5.2.4节)). 许多研究表明,如果目标和弦在谐波上接近(相对于远离)由素数产生的音调中心,则目标和弦的处理速度会更快、更准确(Bigand & Pineau 1997;Tillmann 等人,1998;Bigand 等人。 ,1999 年;Justus 和 Bharucha,2001 年;Tillmann 和 Bigand,2001 年)。重要的是,这种优势不仅仅是由于上下文和目标的心理声学相似性,而是由于它们在和弦和键的结构化认知空间中的距离(Bharucha & Stoeckig,1987 年;Tekman & Bharucha,1998 年;Bigand 等人,2003 年) . 因此,和声启动效应表明了对音调音乐句法约定的隐含知识,并且在西方文化中的非音乐听众中反复得到证实(例如,Bigand 等人,2003 年)。
Although the above study employed explicit judgments of tonality, it is also important to test musical syntactic abilities using implicit tasks. This is because research by Tillmann (2005) has shown that individuals with music syntactic deficits in explicit tasks can nevertheless show implicit access to musical syntactic knowledge. The task we used to tap implicit syntactic abilities is harmonic priming (Bharucha & Stoeckig, 1986). Harmonic priming is a well-studied paradigm in music cognition that tests the influence of a preceding harmonic context on the processing of a target chord (cf. sections 5.2.3 and 5.2.4). Much research has shown that a target chord is processed more rapidly and accurately if it is harmonically close to (vs. distant from) the tonal center created by the prime (Bigand & Pineau 1997; Tillmann et al., 1998; Bigand et al., 1999; Justus & Bharucha, 2001; Tillmann & Bigand, 2001). Importantly, this advantage is due not simply to the psychoacoustic similarity of context and target, but to their distance in a structured cognitive space of chords and keys (Bharucha & Stoeckig, 1987; Tekman & Bharucha, 1998; Bigand et al., 2003). The harmonic priming effect thus indicates implicit knowledge of syntactic conventions in tonal music, and has been repeatedly demonstrated in nonmusician listeners in Western cultures (e.g., Bigand et al., 2003).
图 5.19失语症患者(黑点)和对照组(空心圆圈)在音乐和语言任务上的表现之间的关系。为失语症和对照显示了单独的最佳拟合回归线。(A) 显示了音乐任务和语言句法任务的表现之间的关系,(B) 显示了音乐任务和语言语义任务的表现之间的关系。
Figure 5.19 Relationship between performance on musical and linguistic tasks for aphasics (black dots) and controls (open circles). Separate best-fitting regression lines are shown for aphasics and controls. (A) shows relations between performance on the music task and the language syntax task, and (B) shows relations between performance on the music task and the language semantics task.
Patel、Iversen 及其同事 (2008) 使用第二组 9 名 Broca 失语症研究了 Broca 失语症中的谐波启动(参见 Patel,2005)。(与第一项研究一样,我们首先使用句图匹配任务确定失语症患者存在句法理解缺陷。与该研究一样,所有失语症患者的大脑半球均有损伤,但这些损伤并不总是包括布罗卡区。)我们使用和声启动任务的原始双和弦版本,以单个和弦作为启动任务 (Bharucha & Stoeckig, 1986)。Prime 和 target 各长 1 秒,间隔 50 毫秒。这对注意力和记忆力的要求最低,因此适用于失语症患者。素数和目标之间的和声距离由音乐键的五分之一圈规定:和谐接近和遥远的目标分别是距离圆上素数顺时针方向两步和四步。这直接将传统的谐波距离与心理声学相似性相提并论,因为远距离目标与素数共享一个共同的音调(Tekman & Bharucha,1998 年;图 5.20)。
Patel, Iversen, and colleagues (2008) studied harmonic priming in Broca’s aphasia using a second group of 9 Broca’s aphasics (cf. Patel, 2005). (As in the first study, we first established that the aphasics had a syntactic comprehension deficit using the sentence-picture matching task. As in that study, all aphasics had left hemisphere lesions, but these did not always include Broca’s area.) We used the original two-chord version of the harmonic priming task, with a single chord serving as the prime (Bharucha & Stoeckig, 1986). Prime and target were 1 s long each, separated by 50 ms. This places minimal demands on attention and memory and is thus suitable for use with aphasics. The harmonic distance between prime and target was regulated by the circle of fifths for musical keys: Harmonically close versus distant targets were two versus four steps clockwise steps away from the prime on the circle, respectively. This directly pits conventional harmonic distance against psychoacoustic similarity, because the distant target shares a common tone with the prime (Tekman & Bharucha, 1998; Figure 5.20).
参与者的任务是判断第二个和弦是调准还是误调(在 50% 的试验中,它是通过压低和弦中的一个音符而误调的)。然而,主要关注点是对调谐目标的反应时间 (RT) 作为它们与质数的谐波距离的函数。关闭和弦与远和弦的更快 RT 是谐波启动的证据。(在进行启动研究之前,失语症患者完成了两个简短的实验,这些实验表明他们可以区分调谐和弦,并且没有听觉短期记忆缺陷。)
The participants’ task was to judge whether the second chord was tuned or mistuned (on 50% of the trials, it was mistuned by flattening one note in the chord). The main focus of interest, however, was the reaction time (RT) to well-tuned targets as a function of their harmonic distance from the prime. A faster RT to close versus distant chords is evidence of harmonic priming. (Prior to doing the priming study, the aphasics completed two short experiments that showed that they could discriminate tuned from mistuned chords and did not have auditory short-term memory deficits.)
图 5.20谐波启动任务的启动和目标和弦示例。所有和弦都是大和弦,是五度圈的主和弦。在这种情况下,素数是 C 大调和弦。近目标是 D 大调和弦,远目标是 E 大调和弦。
Figure 5.20 Example of prime and target chords for the harmonic priming task. All chords were major chords, being the principal chord of a key from the circle of fifths. In this case, the prime is a C major chord. The close target is a D major chord, and the distant target is an E major chord.
结果很清楚。控件显示正常的谐波启动,对谐波近距离目标的反应时间比对远距离调整良好的目标的反应时间更快。然而,失语症没有表现出启动效应,甚至表现出对远处目标更快的不显着趋势,这表明反应是由心理声学相似性而非谐波知识驱动的(图 5.21)。
The results were clear. Controls showed normal harmonic priming, with faster reaction times to harmonically close versus distant well-tuned targets. Aphasics, however, failed to show a priming effect, and even showed a nonsignificant trend to be faster on distant targets, suggestive of responses driven by psychoacoustic similarity rather than by harmonic knowledge (Figure 5.21).
因此,在语言中存在句法理解问题的失语症似乎无法激活西方非音乐家通常表现出的和声关系的内隐知识。重要的是,这种缺陷不是脑损伤的普遍后果,因为有些双侧皮质损伤的个体表现出正常的谐波启动(Tramo 等人,1990 年;Tillmann,2005 年)。
Thus aphasics with syntactic comprehension problems in language seem to have problems activating the implicit knowledge of harmonic relations that Western nonmusicians normally exhibit. Importantly, this deficit is not a generalized consequence of brain damage, because there are cases of individuals with bilateral cortical lesions who show normal harmonic priming (Tramo et al., 1990; Tillmann, 2005).
这两项失语症研究共同指出了语言和音乐句法处理之间的联系。结果与 SSIRH 一致,这表明音乐和语言共享神经资源以在句法处理过程中激活特定领域的表征。这些资源的缺乏似乎影响了音乐和语言。从神经语言学的角度来看,这支持失语症句法障碍的“处理观点”,即激活存储的句法表征(例如,动词及其词汇类别和主题角色信息)的一般问题,而不是这些句法的破坏陈述(Kolk & Friederici, 1985; Haarmann & Kolk, 1991; Kolk, 1998; Swaab 等人, 1998; Kaan & Swaab, 2002)。从音乐认知的角度来看,结果值得注意,表明左半球语言回路在非专业音乐家的音乐句法处理中发挥作用。寻求确定特定左半球电路对音乐处理的重要性的未来工作将需要使用具有更严格控制的病变轮廓的失语症。了解具有更均匀病变特征的失语症(例如,左额脑损伤的语法性失语症)是否会显示语言和音乐任务表现之间的联系比当前实验中发现的更强,这将是一件有趣的事情病灶概况。寻求确定特定左半球电路对音乐处理的重要性的未来工作将需要使用具有更严格控制的病变轮廓的失语症。了解具有更均匀病变特征的失语症(例如,左额脑损伤的语法性失语症)是否会显示语言和音乐任务表现之间的联系比当前实验中发现的更强,这将是一件有趣的事情病灶概况。寻求确定特定左半球电路对音乐处理的重要性的未来工作将需要使用具有更严格控制的病变轮廓的失语症。了解具有更均匀病变特征的失语症(例如,左额脑损伤的语法性失语症)是否会显示出比当前实验中发现的语言和音乐任务表现之间更强的联系,这将很有趣病灶概况。
Together, these two aphasia studies point to a connection between linguistic and musical syntactic processing. The results are consistent with the SSIRH, which suggests that music and language share neural resources for activating domain-specific representations during syntactic processing. A deficiency in these resources appears to influence both music and language. From the standpoint of neurolinguistics, this supports a “processing view” of syntactic disorders in aphasia, that is, a general problem activating stored syntactic representations (e.g., verbs together with their lexical category and thematic role information), rather than a disruption of these representations (Kolk & Friederici, 1985; Haarmann & Kolk, 1991; Kolk, 1998; Swaab et al., 1998; Kaan & Swaab, 2002). From the standpoint of music cognition, the results are notable in suggesting that left hemisphere language circuits play a role in musical syntactic processing in nonprofessional musicians. Future work seeking to determine the importance of particular left hemisphere circuits to music processing will need to employ aphasics with more tightly controlled lesion profiles. It will be interesting to know if aphasics with a more uniform lesion profile (e.g., agrammatic aphasics with left frontal brain damage) would show even stronger links between performance on language and music tasks than found in the current experiments, which employed aphasics with rather variable lesion profiles.
图 5.21 RT 差异的箱形图与谐波远距离目标和近距离目标的差异。数据用于对调谐目标的正确响应。每个方框内的水平线表示中值,倾斜的方框边缘表示置信区间,方框的上限和下限表示四分位数范围。对照的绝对平均 RT(括号中的 se):近距离目标 0.99 (0.07) s,远距离目标 1.05 (0.06) s。失语症:近距离目标 1.68 (.22) 秒,远距离目标 1.63 (.17) 秒。
Figure 5.21 Box plots for RT difference to harmonically distant versus close targets. Data are for correct responses to tuned targets. The horizontal line in each box indicates the median value, the slanted box edges indicate confidence intervals, and the upper and lower bounds of the box indicate interquartile ranges. Absolute mean RTs for controls (s.e. in parentheses): close targets .99 (.07) s, distant targets 1.05 (.06) s. Aphasics: close targets 1.68 (.22) s, distant targets 1.63 (.17) s.
从更广泛的角度来看,上述结果表明是时候重新唤醒对失语症中音乐句法处理的研究了,这个话题自 Francès 等人的开创性工作以来经历了 30 年的中断。(1973)。音乐和语言的平行研究提供了一种探索失语症处理缺陷本质的新方法(参见 Racette 等人,2006 年)。此类研究具有临床意义,并提出了一种有趣的可能性,即最终有可能在通用计算框架中模拟失语症中的语言和音乐句法缺陷(参见 Tillmann 等人,2000 年;McNellis 和 Blumstein,2001 年)。
From a broader perspective, the above results indicate that it is time to reawaken the study of music syntactic processing in aphasia, a topic that has experienced a 30-year hiatus since the pioneering work of Francès et al. (1973). Parallel studies of music and language offer a novel way to explore the nature of aphasic processing deficits (cf. Racette et al., 2006). Such research has clinical implications, and raises the intriguing possibility that it may ultimately be possible to model linguistic and musical syntactic deficits in aphasia in a common computational framework (cf. Tillmann et al., 2000; McNellis & Blumstein, 2001).
大约 30 年前,伦纳德·伯恩斯坦 (Leonard Bernstein) 在哈佛的挑衅性演讲激发了人们对音乐句法和语言句法之间的认知比较的兴趣。尽管他自己在这个问题上的想法还没有经受住时间的考验,但他对一个重要环节的直觉现在正得到认知神经科学现代研究的支持。这项研究表明,尽管音乐句法和语言句法具有截然不同且特定于领域的句法表征,但在句法处理过程中用于激活和整合这些表征的神经资源存在重叠(“共享句法整合资源假设”[SSIRH])。探索这种重叠是令人兴奋的,因为它提供了一种新颖的方式来阐明这两个领域中句法的神经基础。
Roughly 30 years ago, Leonard Bernstein’s provocative lectures at Harvard sparked interest in cognitive comparisons between musical and linguistic syntax. Although his own ideas on the subject have not stood the test of time, his intuition of an important link is now being supported by modern research in cognitive neuroscience. This research suggests that although musical and linguistic syntax have distinct and domain-specific syntactic representations, there is overlap in the neural resources that serve to activate and integrate these representations during syntactic processing (the “shared syntactic integration resource hypothesis” [SSIRH]). Exploring this overlap is exciting because it provides a novel way to illuminate the neural foundations of syntax in both domains.
本章试图说明音乐句法和语言句法的比较研究应该建立在对两个系统之间的重要区别。了解这些差异不必阻止比较研究。相反,他们应该引导此类研究越过困住早期思想家(包括伯恩斯坦)的陷阱。一旦避开这些陷阱,就会到达一个广阔而肥沃的调查领域,一个刚刚开始探索的领域。当这些探索是假设驱动的并且基于两个领域中以经验为基础的认知理论时,它们可能是最富有成果的。如果能够做到这一点,那么生物学中比较方法的力量就可以发挥在人类大脑非凡的句法能力上。
This chapter has attempted to illustrate that comparative research on musical and linguistic syntax should be grounded in a solid understanding of the important differences between the two systems. Understanding these differences need not deter comparative research. On the contrary, they should guide such research past the pitfalls that trapped earlier thinkers (including Bernstein). Once these traps are avoided, a large and fertile field for investigation is reached, a field that has only just begun to be explored. Such explorations are likely to be the most fruitful when they are hypothesis-driven and based on empirically grounded cognitive theory in the two domains. If this can be achieved, then the power of the comparative method in biology can be brought to bear on the human mind’s remarkable capacity for syntax.
1我在这里处理的是音乐句法的“实质性普遍性”:即使不是全部,也出现在大多数广泛的音乐文化中的结构模式。音乐普遍性概念的另一种方法是考虑音乐处理背后的认知普遍性(参见 Lerdahl & Jackendoff,1983;Jackendoff,2002:75)。
1 I am dealing here with “substantive universals” of musical syntax: structural patterns that appear in most if not all widespread musical cultures. A different approach to the notion of musical universals is to consider the cognitive universals underlying music processing (cf. Lerdahl & Jackendoff, 1983; Jackendoff, 2002:75).
2通过在“音阶”概念中包含音调等级的概念,我使用“音阶”一词来包括 Dowling (1978) 在他对音乐音阶结构的讨论中所称的“调式”。
2 By including the notion of a tonal hierarchy in the concept of “scale,” I am using the term “scale” to include what Dowling (1978) has called “mode” in his discussion of scale structure in music.
3应该注意的是,在这项研究中,旋律没有以移调的形式呈现,因为在相同-不同的任务中,孩子们倾向于对移调变化做出“不同”的反应。
3 It should be noted that in this study, melodies were not presented in transposition, because in the same-different task the children tended to respond “different” to transpositional changes.
4树右侧的椭圆形表示从左侧向上突出的短主题(第二个乐句的前四个音符,CCGG)从属于椭圆形触及的两个分支,而不是只是左边的分支(参见 Lerdahl & Jackendoff,1983:138)。这种微妙之处对于当前的讨论来说并不重要。
4 The oval shape in the right part of the tree is meant to indicate that the short motif that projects upward to it from the left (the first four notes of the second phrase, C-C-G-G) is subordinate to both branches touched by the oval, not just the left branch (cf. Lerdahl & Jackendoff, 1983:138). This subtlety is not essential for the current discussion.
5使用 TPS 进行未来研究的一个重要方向是设计音乐序列,其中基于纯序列关系的预测与基于层次关系的预测有很大不同(参见 Smith & Cuddy,2003 年)。确定一些听众是否按顺序听,而其他听众按层次听,以及这是否反映了音乐训练的程度,将特别有趣。
5 One important direction for future research using TPS is to design musical sequences in which predictions based on purely sequential relations are very different from predictions based on hierarchical relations (cf. Smith & Cuddy, 2003). It would be particularly interesting to determine if some listeners hear more sequentially, whereas others hear more hierarchically, and if this reflects the degree of musical training.
6上标 6 指的是标准 II 和弦,换句话说,C 大调中的 DFA,以“倒置”形式演奏,使得 F 是最低音。和弦倒转在调性音乐中很常见。
6 The superscript 6 refers to the fact that the standard II chord, in other words, D-F-A in C major, is played in an “inverted” form such that the F is the lowest note. Chord inversions are common in tonal music.
7在 GTTM 中,延长-缩减树中存在一种伪选区,因为树中的节点具有明显的句法标识,如级数、弱延长或强延长(Lerdahl 和 Jackendoff,1983:182)。然而,即使在这些情况下,和弦在语言意义上也不是真正的成分关系,因为我们无法先验地定义成分的语法类别必须是什么。
7 In GTTM, there is a kind of pseudoconstituency in prolongation-reduction trees, in that nodes in a tree have a distinct syntactic identity as a progression, a weak prolongation, or a strong prolongation (Lerdahl & Jackendoff, 1983:182). However, even in these cases, the chords are not really in a constituent relationship in a linguistic sense, in that one cannot define a priori what the grammatical categories of a constituent must be.
8这种关于音乐中和声功能的讨论强调需要在基于音阶音阶的和弦名称与和弦实现的更抽象的和声功能之间做出明确的概念区分。不幸的是,音乐理论术语在这方面令人困惑,因为相同的术语(例如主音、属音和次属音)用于根据音调成分和谐波功能来指代和弦。避免这种混淆的一种方法是在提及和弦的内音结构时用罗马数字名称(例如,IV 和弦)来称呼和弦,而在提及它们的功能时用语言标签(例如,“次属”)来称呼和弦。
8 This discussion of harmonic functions in music highlights the need to make a clear conceptual distinction between the names of chords based on the scale tones on which they are built and the more abstract harmonic functions that chords fulfill. Unfortunately, music theory terminology is confusing in this regard, in that the same terms (such as tonic, dominant, and subdominant) are used to refer to chords on the basis of their tonal constituents and on the basis of their harmonic functions. One way to avoid this confusion would be to call chords by their Roman numeral names (e.g., a IV chord) when referring to their inner tone structure and by verbal labels (e.g., “subdominant”) when referring to their functions.
9 P600 在 N400 之后达到峰值的事实并不一定意味着语义操作先于句法操作。ERP 是对大脑活动的间接测量,对局部同步活动模式特别敏感。虽然可以从 ERP 中推断出处理不迟于某个时间发生,但不能从 ERP 中推断出处理开始的时间有多早,因为相关处理总是有可能在看到 ERP 之前就开始了,但不够同步以至于无法检测到。
9 The fact that the P600 peaks after the N400 does not necessarily mean that semantic operations precede syntactic ones. ERPs are an indirect measure of brain activity that are especially sensitive to locally synchronous activity patterns. Although one can infer from ERPs that processing occurs no later than a certain time, one cannot infer from ERPs how early processing begins, because it is always possible that relevant processing commences before one sees the ERP, but is not synchronous enough to be detected.
10从技术上讲,集成是在“已发送”和与代词“谁”共同索引的空类别对象之间进行的。
10 Technically, the integration is between “sent” and an empty-category object that is coindexed with the pronoun “who.”
11对于细节感兴趣的人,“The”和“reporter”之间的距离为1,因为自整合站点“The”以来引入了一个新的话语所指(“reporter”)。“reporter”和“who”之间的距离为0,因为“who”等没有引入新的话语所指。句子之间的关键区别在于将“发送”整合到现有的句子结构中。在上面的句子(有一个主语抽取的关系从句)中,“sent”(与“who”)有一个距离为 1 的整合。在下面的句子(有一个宾语抽取的关系从句)中,“sent” ”有两个集成(与“摄影师”和“谁”),前者的距离为 1,后者的距离为 2。这导致下句中“发送”的总集成成本为 3。
11 For those interested in detail, the distance between “The” and “reporter” is 1 because one new discourse referent (“reporter”) has been introduced since the site of integration on “The.” The distance between “reporter” and “who” is 0 because no new discourse referents have been introduced by “who,” and so forth. The critical difference between the sentences concerns the integration of “sent” into the existing sentence structure. In the upper sentence (which has a subject-extracted relative clause), “sent” has one integration (with “who”) with a distance of 1. In the lower sentence (which has an object-extracted relative clause), “sent” has two integrations (with “photographer” and “who”), the former having a distance of 1 and the latter having a distance of 2. This results in a total integration cost of 3 for “sent” in the lower sentence.
12 Gibson 认识到将集成成本等同于线性距离是一种简化。其他可能影响单词整合成本的因素包括输入单词与其整合位点之间发生的结构整合的复杂性和上下文合理性(参见 Gibson,2000),以及中间区域项目之间的潜在干扰, 包括传入的单词 (cf. Gordon et al., 2001)。此外,集成成本可能会在一些新的话语指称干预之后达到渐近线,而不是保持距离的线性函数。然而,将成本与线性距离等同提供了一个很好的起点,并且在预测行为数据(例如句子处理实验中的阅读时间数据)方面表现良好。
12 Gibson recognizes that equating integration cost with linear distance is a simplification. Other factors which may influence the integration cost of a word include the complexity and contextual plausibility of structural integrations that have taken place between the incoming word and its site of integration (cf. Gibson, 2000), and potential interference among items in the intervening region, including the incoming word (cf. Gordon et al., 2001). Furthermore, integration cost may reach an asymptote after a few intervening new discourse referents rather than remaining a linear function of distance. Nevertheless, equating cost with linear distance provides a good starting point and performs well in predicting behavioral data such as reading time data in sentence processing experiments.
13具有音调感可以根据不同程度的稳定性来感知音调,这对音乐的动态感知质量有很大帮助。
13 Having a sense of key permits tones to be perceived in terms of varying degrees of stability, which contributes strongly to the dynamic perceptual quality of music.
14这个和弦也被称为那不勒斯六和弦,在较长的音乐语境中(即它不是最后的和弦)它可以代替次属和弦。
14 This chord is also known as a Neapolitan 6th chord, and in longer musical contexts (i.e., in which it is not the final chord) it can function as a substitution for the subdominant chord.
Chapter 6
Meaning
6.1 Introduction
6.1.1 Music and Translation
6.1.2 What Does One Mean by “Meaning”?
6.2 A Brief Taxonomy of Musical Meaning
6.2.1 The Structural Interconnection of Musical Elements
6.2.2 The Expression of Emotion
6.2.3 The Experience of Emotion
6.2.4 Motion
6.2.5 Tone Painting
6.2.6 Musical Topics
6.2.7 Social Associations
6.2.8 Imagery and Narrative
6.2.9 Association With Life Experience
6.2.10 Creating or Transforming the Self
6.2.11 Musical Structure and Cultural Concepts
6.3 Linguistic Meaning in Relation to Music
6.3.1 Music and Semantics
The Semantics of Leitmotifs
Neural Evidence That Music Can Evoke Semantic Concepts
6.3.2 Music and Pragmatics
Cognitive Aspects of Discourse Coherence
Neural Aspects of Discourse Coherence
6.4 Interlude: Linguistic and Musical Meaning in Song
6.5 The Expression and Appraisal of Emotion as a Key Link
6.5.1 Acoustic Cues to Emotion in Speech and Music
6.5.2 Neural Aspects of Auditory Affect Perception
6.5.3 Cross-Domain Influences
6.5.4 Emotion in Speech and Music: Future Directions
6.6 Conclusion
注意:在本章中,“语言”指的是日常交流中的普通语言,而不是诗歌、哲学或其他专门的话语形式。除非另有说明,否则“音乐”是指器乐——没有文字的音乐。将器乐与普通语言进行比较的原因植根于认知科学,并在第 1 章中进行了讨论。
Note: Throughout this chapter, “language” refers to the ordinary language of everyday communication, not poetry, philosophy, or other specialized forms of discourse. “Music” refers to instrumental music—music without words—unless otherwise stated. The reasons for comparing instrumental music to ordinary language are rooted in cognitive science, and are discussed in Chapter 1.
语言意义和音乐意义之间的关系具有矛盾的特征。一方面,考虑一下克劳德·列维-斯特劳斯 (Claude Lévi-Strauss) 被广泛引用的言论,即音乐是“唯一具有既可理解又不可翻译的矛盾属性的语言” 1 (Lévi-Strauss, 1964:18)。也就是说,尽管可以在合理的保真度下在任何两种人类语言之间进行翻译2,但考虑将音乐翻译成语言(例如,将莫扎特交响曲翻译成文字)或将音乐翻译成音乐(例如,贝多芬室内乐)的想法意义不大将音乐片段转换成爪哇甘美兰作品),并期望原始材料的含义得以保留。
The relationship between linguistic and musical meaning has a paradoxical character. On the one hand, consider the much-cited remark of Claude Lévi-Strauss that music is “the only language with the contradictory attributes of being at once intelligible and untranslatable”1 (Lévi-Strauss, 1964:18). That is, although it is possible to translate between any two human languages with reasonable fidelity,2 it makes little sense to think of translating music into language (e.g., a Mozart symphony into words), or music into music (e.g., a Beethoven chamber music piece into a Javanese gamelan work) and expect that the meaning of the original material would be preserved.
另一方面,音乐比语言更容易跨越文化界限。也就是说,有可能遇到来自另一种文化的音乐并很快接受它。3(相比之下,听外国小说里的演讲除非一个人学习过语言,否则不太可能长期保持对语言的兴趣。)我对几种音乐有过这种体验,包括日本古筝音乐和爪哇加麦兰音乐。在每种情况下,我都从音乐中听到了以前从未听过的东西,并感到有必要回到音乐中去体验它所激发的新思想和感受。
On the other hand, music crosses cultural boundaries more easily than language does. That is, it is possible to encounter music from another culture and to take to it very quickly.3 (In contrast, listening to speech in a novel foreign language is unlikely to sustain one’s interest for long unless one has studied the language.) I have had this experience with several kinds of music, including Japanese koto music and Javanese gamelan music. In each case, I heard things in the music that I had never heard before, and felt compelled to return to the music to experience the new thoughts and feelings it inspired.
如何解释这些关于音乐和语言的明显矛盾的事实?让我们首先考虑语言与音乐的可译性问题。在非常普遍的层面上,人们可以将不同的语言视为实现同一件事的不同方式:在个人之间传递某些基本类型的意义。例如,所有语言都允许说话者 (1) 提出命题,换句话说,指代特定的实体或概念(例如“表亲”或“正义”)并谓述有关它们的事物,(2) 表达愿望和提出问题,以及 (3) 做出元语言陈述,例如“在我的语言中,我们没有一个词与你们语言中的‘正义’意思完全相同”(Hermerén,1988 年)。
How can these apparently contradictory facts about music and language be explained? Let us first consider the issue of the translatability of language versus music. At a very general level, one can view different languages as different ways of achieving the same thing: the transmission of certain basic types of meanings between individuals. For example, all languages allow speakers (1) to make propositions, in other words, to refer to specific entities or concepts (such as “cousin” or “justice”) and to predicate things about them, (2) to express wishes and ask questions, and (3) to make metalinguistic statements, such as “In my language we do not have a word that means the exactly the same thing as ‘justice’ in your language” (Hermerén, 1988).
音乐不具有这些意义。此外,民族音乐学研究表明,不同的音乐不太可能是传播任何基本的、共同的意义集的不同方式(Becker,1986)。即使在西方文化的范围内,音乐的多样性也掩盖了诸如“音乐意义总是关于 X”这样简单化的想法,其中 X 代表任何单一概念。例如,考虑一下器乐的意义“总是关于情感”的断言。断言情感是音乐意义的必要条件是有问题的,因为在某些情况下,听众可以在没有任何明显情感的情况下发现音乐作为形式的精彩演奏而有意义(我自己对巴赫赋格曲的反应属于此类) .
Music does not bear these kinds of meanings. Furthermore, ethnomusicological research suggests it is unlikely that different musics are different ways of transmitting any basic, common set of meanings (Becker, 1986). Even within the confines of Western culture, the diversity of music belies simplistic ideas such as “musical meaning is always about X,” in which X represents any single concept. Consider, for example, the assertion that the meaning of instrumental music is “always about emotion.” Asserting that emotion is the sine qua non of musical meaning is problematic, because there are cases in which a listener can find music meaningful as a brilliant play of form without any salient sense of emotion (my own response to Bach fugues falls in this category).
将视野从西方音乐扩展到人类文化的多样性,即使不是不可能,也很难为音乐赋予一套普遍的含义。这就是为什么音乐与语言不同,不能在不显着改变原始材料的含义的情况下进行翻译。考虑上面给出的贝多芬室内乐作品与爪哇加麦兰作品的示例。为了便于论证,让我们假设这些作品在音乐合奏的规模(即音乐家和乐器的数量)方面具有可比性。让我们进一步假设这些作品分别在欧洲和印度尼西亚被对其他文化的音乐知之甚少的知识渊博的听众群体听到。尝试将贝多芬的作品“翻译”成加麦兰的作品意味着什么,反之亦然?当然,一个人可以转移某些表面特征:西方管弦乐队可以演奏爪哇旋律,加麦兰可以演奏贝多芬旋律。然而,即使在这里也会有挑战,例如音阶的差异以及原始音色和纹理的丢失。与语言不同,这些由翻译引起的声音变化不仅仅是意义的附带。贝多芬作品的意义与用于创作它的特定声音密切相关——音阶及其伴随的和声、弦乐器的音色等等——加麦兰作品也是如此。此外,即使一位才华横溢的作曲家设法创作了一部捕捉了加麦兰音乐某些特征的西方作品,这并不能保证欧洲听众听到“翻译的加麦兰”会体验到与印度尼西亚听众听到真正的加麦兰音乐完全相似的意义(反之亦然),因为收听习惯、文化和背景存在差异。音乐翻译的困难可以与语言之间的翻译相对容易形成对比,并且让两个听众从翻译结果中获得相似的含义(例如,将印度尼西亚新闻广播翻译成德语)。
Broadening the view from Western music to the diversity of human cultures, it is difficult if not impossible to assign a universal set of meanings to music. This is why music, unlike language, cannot be translated without significantly changing the meaning of the original material. Consider the example given above of a Beethoven chamber music piece versus a Javanese gamelan piece. Let us assume for the sake of argument that these pieces are comparable in terms of the size of the musical ensemble (i.e., number of musicians and instruments). Let us further assume that these pieces are heard in Europe and Indonesia, respectively, by communities of knowledgeable listeners who know little about the music of other cultures. What would it mean to try to “translate” the Beethoven piece into a gamelan piece or vice versa? Of course, one could transfer certain surface features: The Western orchestra could play the Javanese melodies and the gamelan could play the Beethoven melodies. Yet even here there would be challenges, such as differences in musical scales and a loss of the original timbres and textures. Unlike with language, these sonic changes due to translation are not merely incidental to meaning. The meaning of the Beethoven piece is intimately tied to the particular sounds used to create it—the scales and their attendant harmonies, the timbre of the string instruments, and so forth—and the same can be said of the gamelan piece. Furthermore, even if a highly talented composer managed to create a Western piece that captured some of the features of gamelan music, this is no guarantee that the European audience hearing the “translated gamelan” would experience meanings at all similar to the Indonesian audience hearing real gamelan music (and vice versa), because of differences in listening habits, culture, and context. The difficulties of musical translation can be contrasted to the comparative ease of translating between language and having both audiences derive similar meanings from the resulting translations (e.g., translating an Indonesian news broadcast into German).
在讨论了音乐翻译的困难之后,现在让我们考虑一下看似矛盾的观察结果,即在没有翻译的情况下,音乐比语言更容易跨越文化界限被欣赏,换句话说,当听众听到他们几乎听不懂的外国声音时没有以前的经验。当然,不能保证来自一种文化的听众会欣赏另一种文化的音乐。然而,突出的事实是,这样的事情可以发生。因此,相关的问题是“这怎么可能?” 一种观点认为,对外国音乐的欣赏纯粹是感性的:不熟悉的音色可能会“使耳朵发痒”,但对音乐的结构或意义没有真正的感觉。在这种观点下,外国音乐被简单地体验为令人愉悦且有点令人回味的声波炖牛肉。有经验理由怀疑这种观点,因为研究表明,不熟悉新音乐的听众仍然对音乐中音调的统计分布很敏感,并且可以在此基础上推断出一些结构关系(Castellano 等人, 1984 年;Kessler 等人,1984 年;Oram & Cuddy,1995 年;Krumhansl 等人,1999 年,2000 年)。
Having discussed the difficulty of musical translation, let us now consider the seemingly paradoxical observation that music can be more readily appreciated across cultural boundaries than language when there is no translation, in other words, when the listener hears foreign sounds of which they have little or no previous experience. Of course, it is by no means guaranteed that a listener from one culture will appreciate the music of another. However, the salient fact is that such a thing can occur. Thus the relevant question is “How is this possible?” One idea is that appreciation of foreign music is purely sensual: Unfamiliar timbres may “tickle the ear,” but there is no real sense of the music’s structure or significance. In this view, the foreign music is simply experienced as a pleasant and somewhat evocative sonic goulash. There are empirical reasons to treat this view with skepticism, because research has shown that listeners unfamiliar with a new music are nevertheless sensitive to the statistical distributions of tones within the music, and can infer some structural relations on this basis (Castellano et al., 1984; Kessler et al., 1984; Oram & Cuddy, 1995; Krumhansl et al., 1999, 2000).
跨文化音乐欣赏的另一个可能基础是,人们听到的声音关系基于自己特定文化的聆听习惯。例如,人们可能会听到节奏组和旋律动机,尽管这些可能与本地听众听到的组和动机不一致 (Ayari & McAdams, 2003)。在爪哇加麦兰音乐中,当西方听众将节奏强烈的事件解析为“强拍”或节奏模式的开始时,当本地听众将其视为模式的结尾时,就会发生这种情况。然而,关键点在于,即使从文化内部人士的角度来看,一个人对音乐的解析可能是“错误的”,但他仍然可能会发现音乐是有益的,因为他感知到模式和情感不仅仅是从一个人的本土音乐中获知的情感的微不足道的变体。
Another possible basis for cross-cultural appreciation of music is that one hears sonic relationships based on one’s own culture-specific listening habits. For example, one might hear rhythmic groups and melodic motives, though these may not correspond to the groups and motives heard by a native listener (Ayari & McAdams, 2003). In Javanese gamelan music, this can happen when a Western listener parses a rhythmically strong event as the “downbeat” or beginning of a rhythmic pattern, when native listeners hear it as the end of the pattern. The key point, however, is that even though one’s parsing of the music may be “wrong” from the standpoint of a cultural insider, one may nevertheless find the music rewarding because one perceives patterns and sensibilities that are not just trivial variants of those known from one’s native music.
最后,人们实际上可能以一种类似于本地听众的方式来感知音乐关系。这可能是因为某些模式出现在一个人的本土音乐和新音乐中,例如稳定的节拍或旋律中结构“间隙”被后续音符“填充”的趋势(Meyer,1973; cf. Krumhansl 等人,2000 年)。然而,从直觉上看,天真聆听似乎不太可能产生类似于文化知识聆听的结果,尤其是当人们考虑到更复杂的结构关系和音乐现象的文化意义时。
Finally, one may actually perceive musical relations in a manner resembling that of a native listener. This could occur because certain patterns occur in both one’s native music and in the new music, such as a steady beat or the tendency for a structural “gap” in a melody to be “filled” by subsequent notes (Meyer, 1973; cf. Krumhansl et al., 2000). Intuitively, however, it seems unlikely that naïve listening will yield results similar to culturally informed listening, especially if one considers more complex structural relations and the cultural significance of musical phenomena.
让我们关注上面提供的一种解释,即即使一个人对音乐的解析与本地听众的解析不同,也可以感知不熟悉的音乐中的结构和意义。这种情况可能发生的事实表明,音乐意义的某个方面是纯粹形式的。也就是说,与语言不同,音乐对听众来说可能具有意义只是因为其感知的声音逻辑,而无需了解产生音乐的背景或内部人员对其文化意义的理解(Rothstein,1996)。
Let us focus on one explanation offered above, namely the idea that one can perceive structure and significance in unfamiliar music even if one’s parsing of the music is not the same as a native listener’s. The fact that this can occur suggests that there is an aspect of musical meaning that is purely formal. That is, unlike language, music can have meaning for a listener simply because of its perceived sonic logic, without knowledge of the context that gave rise to the music or of an insider’s understanding of its cultural significance (Rothstein, 1996).
因此,音乐可能具有不可翻译的矛盾特性,但有时无需翻译仍能跨越文化界限。在这两个方面,音乐意义都与语言意义截然不同。是否因此语言和音乐意义不相称,比较实证研究不太可能取得成果?一点也不。正如我们将要看到的,语言和音乐意义之间关系的某些方面可以用实证的方式来研究。这样的比较研究是有价值的,因为它们可以促进正在进行的关于音乐与语言意义的学术对话,并有助于阐明人类的认知和神经功能。
Thus music can have the paradoxical properties of being untranslatable yet at times still able to cross cultural boundaries without translation. In both of these respects, musical meaning is quite unlike linguistic meaning. Is it therefore the case that linguistic and musical meaning are incommensurate, and that comparative empirical research is unlikely to be fruitful? Not at all. As we shall see, certain aspects of the relationship between linguistic and musical meaning can be studied in an empirical fashion. Such comparative studies are worthwhile because they can contribute to the ongoing scholarly dialogue about musical versus linguistic meaning, and can help shed light on human cognitive and neural function.
任何对音乐和语言意义的系统比较都假定一个人对语言和音乐“意义”的含义有一个清晰的认识。在语言学方面,对于什么是语言学意义及其构成方式存在相当大的争论,这可能会让一些读者感到惊讶 (Langacker, 1988; Partee, 1995)。出于本书的目的,我们不会深入探讨这些争论,而是以相当基本的方式处理语言意义。语言意义可以大致分为两个领域:语义学和语用学。语义学关注单词和句子如何反映现实或我们对现实的心理表征,其分析单位是命题。语用学关注听众如何向句子添加上下文信息并对所说的内容进行推断 (Jaszczolt, 2002)。在第 6.3.2 节中,关注语用学最终可能会更有成效。
Any systematic comparison of meaning in music and language presupposes that one has a clear idea of what one means by linguistic and musical “meaning.” On the linguistic side, it may come as a surprise to some readers that there is considerable debate over what linguistic meaning is and how it is constituted (Langacker, 1988; Partee, 1995). For the purposes of this book, we will not delve into these debates and will treat linguistic meaning in a fairly elementary way. Linguistic meaning can be broadly divided into two areas: semantics and pragmatics. Semantics concerns how words and sentences reflect reality or our mental representation of reality, and its units of analysis are propositions. Pragmatics concerns how listeners add contextual information to sentences and draw inferences about what has been said (Jaszczolt, 2002). Most comparative discussions of linguistic and musical meaning focus on semantics, though as we shall see in section 6.3.2, it may ultimately prove more fruitful to focus on pragmatics.
音乐意义是什么?本章的介绍只讨论了音乐的意义,但没有给它下定义,把定义留给读者凭直觉。然而,对于科学研究而言,这并不令人满意。音乐理论家和美学哲学家的著作是音乐理论家和美学哲学家的著作,他们在这个主题上撰写(并继续撰写)大量著作(例如,Meyer,1956 年;Cooke,1959 年;Coker, 1972 年;Rantala 等人,1988 年;Kivy,1990 年;2002 年;Nattiez,1990 年;Davies,1994 年;2003 年;Cumming,2000 年;Monelle,2000 年;Koopman & Davies,2001 年;Kramer,2002 年;Zbikowski,2002 年等书籍和文章)。尽管已经清楚地阐明并仔细争论了几个立场,但这些著作并未就音乐意义的定义达成共识。从广义上讲,这些位置可以沿着关于术语“意义”的具体与一般性的概念轴排列。哲学家基维 (Kivy, 2002) 的立场具有非常具体的观点,他认为“意义”应该保留给语言的指称和谓词意义。在这种观点下,音乐没有意义。事实上,Kivy(个人交流)认为音乐“甚至不能是无意义的”,因为音乐没有在这种语言意义上有意义的可能性。在这种观点下,问音乐是否有意义就像问一块石头是否死了:这是一个“类别错误”。(“死”的概念不适用于摇滚,因为只有活着的东西才有可能死。)基维欣然承认音乐可以具有“意义”和“逻辑”(句法术语),并且音乐可以表达情感。在拒绝音乐意义的概念时,他对“意义”采取了一种高度具体的观点,也许是为了避免将这个术语淡化到它在讨论人类思想的多样性时不再做出有用区分的地步。
What of musical meaning? The introduction of this chapter discussed musical meaning without defining it, leaving the definition to the reader’s intuition. For scientific research, however, this is unsatisfactory. A natural place to turn for help in defining musical meaning is the work of music theorists and philosophers of aesthetics, who have written (and continue to write) a great deal on the topic (e.g., Meyer, 1956; Cooke, 1959; Coker, 1972; Rantala et al., 1988; Kivy, 1990; 2002; Nattiez, 1990; Davies, 1994; 2003; Cumming, 2000; Monelle, 2000; Koopman & Davies, 2001; Kramer, 2002; Zbikowski, 2002, among many other books and articles). No consensus has emerged from these writings for a definition of musical meaning, although several positions have been clearly articulated and carefully argued. Broadly speaking, these positions can be arrayed along a conceptual axis with regard to how specific versus general the term “meaning” is taken to be. An example of a position with a very specific view is that of the philosopher Kivy (2002), who argues that “meaning” should be reserved for the linguistic sense of reference and predication. In this view, music does not have meaning. Indeed, Kivy (personal communication) argues that music “cannot even be meaningless,” because music does not have the possibility of being meaningful in this linguistic sense. In this view, asking whether music has meaning is like asking whether a rock is dead: It is a “category error.” (The concept of “dead” does not apply to rocks because only things that are alive have the possibility of being dead.) Kivy readily acknowledges that music can have “significance” and “logic” (in syntactic terms), and that music can express emotion. In rejecting the notion of musical meaning, he is taking a highly specific view of “meaning,” perhaps in order to avoid diluting the term to the point at which it ceases to make useful distinctions in discussing the varieties of human thought.
另一方面,音乐理论家和民族音乐学家让-雅克·纳蒂兹 (Jean-Jacques Nattiez) 主张包容性地使用“意义”一词。Nattiez (1990) 认为意义是最广泛的符号学意义上的意义:当对一个物体/事件的感知使我们想到物体/事件本身以外的东西时,意义就存在了。这种观点隐含着这样一种观念,即语言不应被视为一般意义的模型(Nattiez,2003)。同样隐含的是意义不是对象/事件的属性的想法,因为相同的对象/事件在不同的情况下可能有意义或无意义,这取决于它是否会让人想起其他事情(参见 Meyer,1956:34) . 也就是说,“意义”本质上是一个动态的、相关的过程。我想在 Nattiez 的观点中添加意义承认程度的观点,与想到的事物的数量和复杂性有关。例如,在某些情况下,一首钢琴曲的声音对听众来说可能无非是“我的邻居正在练习”,而在其他情况下,同一首曲子可能会让同一位听众的头脑陷入与音乐结构相关的错综复杂的心理状态,情绪等等。
At the other end of the spectrum, the music theorist and ethnomusicologist Jean-Jacques Nattiez has argued for an inclusive use of the term “meaning.” Nattiez (1990) considers meaning to be signification in the broadest semiotic sense: Meaning exists when perception of an object/event brings something to mind other than the object/event itself. Implicit in this view is the notion that language should not be taken as the model of signification in general (Nattiez, 2003). Also implicit is the idea that meaning is not a property of an object/event, because the same object/event can be meaningful or meaningless in different circumstances, depending on whether it brings something else to mind (cf. Meyer, 1956:34). That is, “meaning” is inherently a dynamic, relational process. I would add to Nattiez’s view the idea that meaning admits of degrees, related to the number and complexity of things brought to mind. For example, in some circumstances, the sound of a piano piece may mean nothing more to a listener than “my neighbor is practicing,” whereas in other circumstances, the same piece may engage the same listener’s mind in intricate mentations related to musical structure, emotion, and so forth.
从语言和音乐的比较研究的角度来看,Kivy 和 Nattiez 的立场都很有趣。第一个位置将我们的注意力集中在作为语义参考的意义上,并引导我们询问与此最接近的音乐模拟可能是什么。正如我们将在6.3.1 节中看到的,认知神经科学的最新研究与这个问题相关。第二种观点鼓励我们思考音乐元素将其他事物带到脑海中的各种方式。本着第二种方法的精神,第 6.2 节提供了音乐意义的简要分类,以探索音乐有意义的不同方式。接下来是与音乐相关的语言意义部分,其中讨论了语言语义学和语用学。本章的最后一节探讨了情感的表达和评价,作为口语和音乐之间的关键环节。
From the standpoint of comparative research on language and music, both Kivy’s and Nattiez’s positions are interesting. The first position focuses our attention on meaning as semantic reference, and leads us to ask what the closest musical analog of this might be. As we shall see in section 6.3.1, recent research in cognitive neuroscience is pertinent to this issue. The second view encourages us to think about the variety of ways in which musical elements bring other things to mind. In the spirit of the second approach, section 6.2 provides a brief taxonomy of musical meaning, to explore the different ways in which music can be meaningful. This is followed by a section on linguistic meaning in relation to music, which discusses linguistic semantics and pragmatics. The final section of the chapter explores the expression and appraisal of emotion as a key link between spoken language and music.
下面简要回顾一下音乐学者们讨论过的11种音乐意义。这不是一个详尽的清单,但它涉及许多主要的讨论领域,并有助于说明器乐所传达的含义的广度。该列表的排列方式是首先讨论更多的“音乐内”含义:随着列表的向下移动,含义变得越来越“音乐外”(与音乐之外的事物有关)。除非另有说明,否则重点是西欧音乐,并尽可能与语言含义进行比较。
The following is a brief review of 11 types of musical meaning that have been discussed by scholars of music. This is not an exhaustive list, but it touches on many of the major areas of discussion and serves to illustrate the breadth of meanings conveyed by instrumental music. The list is arranged such that more “intramusical” meanings are discussed first: Meanings become more and more “extramusical” (relating to things outside the music) as one moves down the list. The focus is on Western European music unless otherwise stated, and comparisons with linguistic meaning are made whenever possible.
应该注意的是,关于音乐意义的一个有趣的研究路线与电影音乐的研究有关,这里不对其进行回顾,因为我们的重点是纯器乐(参见 Cohen,2001 年和 Tagg 和 Clarida,2003 年,关于这方面的研究话题)。
It should be noted that an interesting line of research on musical meaning concerns the study of film music, which is not reviewed here because our focus is on purely instrumental music (see Cohen, 2001, and Tagg & Clarida, 2003, for research on this topic).
当音乐元素将其他音乐元素带到脑海中时,存在最纯粹的音乐内在意义形式。典型的例子是音乐期望,当听到的元素将其他预期元素带到脑海中时(Meyer,1956)。Meyer (1956, 1973) 讨论的“间隙填充”模式说明了这一点,其中旋律中的大音程(“间隙”)导致对连续音高的期望,这些音高将通过遍历“填充”该音程小步的间隙空间(参见 Huron,2006:84-85)。
The purest form of intramusical meaning exists when musical elements bring other musical elements to mind. The prototypical example of this is musical expectation, when heard elements bring other, expected elements to mind (Meyer, 1956). This is illustrated by the “gap-fill” pattern discussed by Meyer (1956, 1973), in which a large interval in a melody (a “gap”) leads to an expectation for successive pitches that will “fill” this interval by traversing the gapped space in small steps (cf. Huron, 2006:84-85).
对音乐的期望可以来自多个来源。一些期望被认为是基于与听觉的格式塔属性相关的听觉普遍性(参见 Narmour,1990),例如对小音高的整体期望间隔。通过学习音乐的特定风格方面,例如西方调性音乐的典型和弦进行,也会产生期望(参见第 5 章)),以及特定片段的规律性,例如意识到当前片段中特定的先行短语由特定的后续短语回答。在每一种情况下,产生的意义类型都是音乐内在的:它不指代音乐本身之外的任何东西。Meyer (1956) 将这种形式的意义称为“具身”意义,当刺激“指示或暗示与刺激本身属于同一类型的事件或后果”时,这种意义就存在了(第 35 页)。迈耶将此与“指示性”意义进行了对比,指示性意义存在于刺激“指示与自身类型不同的事件或后果时,就像当一个词指定或指向一个本身不是一个词的对象时”(第 35 页)。继迈耶之后,不同的学者使用不同的术语来指代相同的基本区别,例如,Nattiez(1990)的“内在与外在指称”和 Jakobson(1971)的“内向与外向符号”。我将使用“音乐内”和“音乐外”这两个术语,因为“体现”在现代认知神经科学中具有特定的含义,这与迈耶的本意截然不同。
Expectations in music can come from multiple sources. Some expectations are thought to be based on auditory universals related to Gestalt properties of audition (cf. Narmour, 1990), such as an overall expectation for small pitch intervals. Expectations also arise via learning of style-specific aspects of music, such as the chord progressions typical of Western tonal music (cf. Chapter 5), and from piece-specific regularities, such as the realization that a particular antecedent phrase is answered by a particular consequent phrase in the current piece. In each of these cases, the type of meaning generated is music-internal: It refers to nothing outside of the music itself. Meyer (1956) has called this form of meaning “embodied” meaning, which exists when a stimulus “indicates or implies events or consequences that are of the same kind as the stimulus itself” (p. 35). Meyer contrasted this with “designative” meaning, which exists when a stimulus “indicates events or consequences which are different from itself in kind, as when a word designates or points to an object which is not itself a word” (p. 35). Following Meyer, different scholars have used different terms to refer the same basic distinction, for example, Nattiez’s (1990) “intrinsic versus extrinsic referring,” and Jakobson’s (1971) “introversive versus extroversive semiosis.” I will use the terms “intramusical” and “extramusical,” because “embodied” has a specific connotation in modern cognitive neuroscience that is distinct from what Meyer intended.
关注音乐内在意义是“绝对主义”或“形式主义”音乐美学方法的一个特征。如上所述,期望是创造形式意义的重要过程。Kivy (2002:78) 提出了另一个过程,类似于捉迷藏游戏,听众试图找出以前听过的主题,这些主题可能或多或少地通过嵌入更大的音乐结构来掩饰。虽然期待是一个前向的过程,当前的音乐元素将未来的元素带入脑海,但音乐剧“捉迷藏”是一个回忆的过程,当前的元素将过去的元素带回到脑海中。在这两种情况下,意义都存在,因为思想超越了当下,与想象中或记忆中的其他音乐元素接触。
A focus on intramusical meaning is a characteristic of the “absolutist” or “formalist” approach to musical aesthetics. As noted above, expectation is an important processes in creating formal meaning. Kivy (2002:78) suggests another process, analogous to a game of hide and seek, whereby a listener seeks to identify previously heard themes, which may be more or less disguised via their embedding in larger musical textures. Although expectation is a forward-directed process in which current musical elements bring future ones to mind, musical “hide and seek” is a recollective process in which current elements bring past elements back to mind. In both cases, meaning exists because the mind reaches out beyond the present moment to make contact with other musical elements in the imagination or in memory.
绝对主义立场的先驱是 Eduard Hanslick,他的短小而有影响力的书《音乐中的美》(1854 年)认为,音乐美学的研究应该植根于音乐结构,而不是听众的情绪(他认为这是反复无常的) ). Hanslick 提出“音乐的本质是运动中的声音”,并用万花筒来比喻音乐之美:
A forerunner of the absolutist position was Eduard Hanslick, whose short and influential book The Beautiful in Music (1854) argued that the study of musical aesthetics should be rooted in musical structure, and not in the emotions of listeners (which he argued were quite capricious). Hanslick proposed that “the essence of music is sound in motion,” and offered the analogy of a kaleidoscope in describing musical beauty:
年轻的时候,我们可能都对万花筒不断变化的色彩和形式感到高兴。现在,音乐是一种万花筒,尽管它的形式只能通过无限高的观念来欣赏。它带来了大量美丽的色彩和形式,现在形成鲜明对比,现在几乎不知不觉地渐变;所有这些都在逻辑上相互关联,但它们的效果都很新颖;可以说,形成了一个完整的、自给自足的整体,没有任何外来的混合物。(第 48 页)
When young, we have probably all been delighted with the ever-changing tints and forms of a kaleidoscope. Now, music is a kind of kaleidoscope, though its forms can be appreciated only by an infinitely higher ideation. It brings forth a profusion of beautiful tints and forms, now sharply contrasted and now almost imperceptibly graduated; all logically connected with each other, yet all novel in their effect; forming, as it were, a complete and self-subsistent whole, free from all alien admixture. (p. 48)
Hanslick 将音乐类比为不断变化和独立的形式,很好地捕捉了音乐的形式主义观点。然而,他的类比有一个缺陷,这导致了形式主义立场的一个有趣的心理学问题。在万花筒中,连续的模式在逻辑上并没有真正的联系:一个模式只是跟随另一个模式,没有任何进展或方向感(Kivy,2002:62-63)。相比之下,音乐中的逻辑联系感是形式主义音乐意义方法的核心。形式主义理论通常在从几个相邻音符到数千个音符的时间尺度上设定结构关系,换句话说,直到跨越整个乐曲的时间尺度(这可能需要几十分钟才能完成)。众所周知,人类认知在短期记忆中记忆和处理信息的能力有限,这从听众的角度提出了大规模形式与音乐内含意的相关性问题。
Hanslick’s analogy of music as ever-shifting and self-contained forms nicely captures a formalist view of music. There is a flaw in his analogy, however, that leads to an interesting psychological question for the formalist position. In a kaleidoscope, the successive patterns are not really logically connected: One pattern merely follows another without any sense of progress or direction (Kivy, 2002:62-63). In contrast, a sense of logical connectedness in music is central to the formalist approach to musical meaning. Formalist theories often posit structural relations at time scales ranging from a few adjacent notes to thousands of notes, in other words, up to time scales spanning an entire piece (which can take many tens of minutes to perform). Human cognition is known for its limited capacity to remember and process information in short-term memory, raising the question of the relevance of large-scale form to intramusical meaning from a listener’s perspective.
大规模关系是否实际上是听众对音乐的理解的一部分一直是一系列有趣的实证研究的焦点。Cook (1987b) 在这方面的一项早期且有影响力的研究,他专注于结构的一个方面,称为“音调闭合”。当音乐以同一调开始和结束时,存在音调闭合,并且是西欧传统中大多数音乐的特征。音乐理论家已经假设,偏离乐曲的初始调会导致音乐紧张,并且这种紧张的成分直到再次达到主调时才会得到解决(例如,Schenker,1979;Lerdahl 和 Jackendoff,1983)。这个想法自然地预示着听众应该区分以相同调和不同调开始和结束的曲目,也许更喜欢前者,因为它听起来更连贯或在美学上更令人满意。库克通过将一些古典钢琴曲的最后部分移调到不同的调来测试这个想法,这样开始和结束的调就不同了。本科音乐专业的学生听到了原创作品和经过改编的作品,并被要求根据四个审美标准对它们进行评分,包括“连贯性”和“完成感”。
Whether large-scale relations are in fact part of a listener’s apprehension of a music has been the focus of an interesting line of empirical research. One early and influential study in this vein is that of Cook (1987b), who focused on an aspect of structure known as “tonal closure.” Tonal closure exists when music starts and ends in the same key, and is characteristic of most of the music in the Western European tradition. Music theorists have posited that departure from the initial key of a piece contributes to musical tension, and that this component of tension is not resolved until the home key is reached again (e.g., Schenker, 1979; Lerdahl & Jackendoff, 1983). This idea naturally predicts that listeners should distinguish between pieces that start and end in the same key versus in different keys, perhaps preferring the former as sounding more coherent or aesthetically satisfying. Cook tested this idea by transposing the final section of a number of classical piano pieces to a different key, so that the starting and ending key were different. Undergraduate music students heard the original and manipulated pieces and were asked to rate them on four aesthetic scales, including “coherence” and “sense of completion.”
库克发现,只有对于最短的曲目(长度为 1 分钟),听众对原曲的评价才会高于修改后的曲目。库克指出,由于大多数古典乐曲的时长都超过 1 分钟,因此大规模音调统一理论可能与听众的感知体验无关。库克明智地指出,这并不意味着这些理论与音乐无关。它们可能有助于揭示作曲家是如何设计他们的作品的,从而解决作品的概念结构(而不是其感知结构)。然而,值得注意的是,在库克的研究和随后通过打乱乐曲中各个部分的顺序来操纵音乐形式的研究中,都缺乏对大规模结构的敏感性,即使是训练有素的音乐家(例如,Karno & Konecni, 1992; Deliège et al., 1996; Tillmann & Bigand, 1996; Marvin & Brinkman, 1999)。至少有一位赞成对音乐采取正式方法的哲学家认为,我们在音乐中理解的结构关系在时间上比通常假设的要本地化得多(Levinson,1997),持续时间可能仅限于 1 分钟左右(参见 Tan & Spackman, 2005 一些相关的经验数据)
Cook found that only for the shortest piece (1 minute in length) did the listeners rate the original piece higher than the altered one. Cook notes that because most classical pieces are considerably longer than 1 minute, theories of large-scale tonal unity might not be relevant to a listener’s perceptual experience. Cook wisely points out that this does not mean that such theories are musically irrelevant; they may help reveal how composers’ designed their work, thus addressing the piece’s conceptual structure (rather than its perceptual structure). What is notable, however, in both Cook’s study and subsequent studies that have manipulated musical form by scrambling the order of sections in a piece, is the lack of sensitivity to large-scale structure, even in highly trained musicians (e.g., Karno & Konecni, 1992; Deliège et al., 1996; Tillmann & Bigand, 1996; Marvin & Brinkman, 1999). At least one philosopher who favors a formal approach to music has argued that the structural relations we apprehend in music are much more localized in time than has been generally assumed (Levinson, 1997), perhaps limited to about 1 minute in duration (see Tan & Spackman, 2005 for some relevant empirical data)
从与语言比较的角度来看,这些音乐操纵的研究很有趣,因为用语言进行比较实验可能会产生截然不同的结果。例如,假设拿一篇需要几十分钟才能阅读的杂志文章,打乱段落或章节的顺序,然后让读者判断是原文还是打乱的版本更连贯。毫无疑问,读者会喜欢原著。这表明,除非序列具有语义指称和谓词的特性,否则头脑只能在相当有限的时间尺度内在这些序列中建立结构关系。4个
From the standpoint of comparison with language, these studies of musical manipulation are quite interesting, because doing comparable experiments with language would likely produce quite a different result. For example, imagine taking a magazine article that requires tens of minutes to read, scrambling the order of the paragraphs or sections, and then asking readers to judge whether the original or scrambled version is more coherent. There is little doubt that readers would favor the original. This suggests that unless sequences have the qualities of semantic reference and predication, the mind can forge only structural relations within such sequences over a rather limited timescale.4
正如目前所描述的,绝对主义的立场相当“冷漠”,只关注结构而没有讨论音乐的表现力或它所产生的动态感受。对于许多人来说,后面这些方面是音乐意义的基础。绝对主义者认识到情感在音乐中的重要性,并找到了与情感建立联系的不同方式。Meyer (1956) 认为,当期望没有实现时,音乐会在听众中产生情感。由于这种类型的影响与期望密不可分,因此可以将其视为音乐内意义的一部分(第 39 页)。也就是说,迈耶的情绪是一种反映动态过程引起的短暂唤醒的音乐影响,与日常意义上的情绪(例如快乐、悲伤)不同,因为它缺乏正价或负价(尽管它可能会收到价方面的事后解释)。正如我们将在在第 6.2.3 节中,这一观点在音乐和情感的实证研究中得到了有趣的支持。
As described so far, the absolutist position is quite “cold,” focusing only on structure and devoid of any discussion of music’s expressiveness or of the dynamic feelings it engenders. For many people, these latter aspects are fundamental to musical meaning. Absolutists recognize the importance of emotion in music, and have found different ways of making a link to emotion. Meyer (1956) argues that music creates emotion in a listener when an expectation is not fulfilled. Because this type of affect is inextricably intertwined with expectation, it can be considered a part of intramusical meaning (p. 39). That is, Meyer’s emotion is a type of musical affect reflecting a transient arousal due to a dynamic process, and is not the same as emotion in the everyday sense of the term (e.g., happiness, sadness), because it lacks a positive or negative valence (though it may receive a post hoc interpretation in terms of valence). As we shall see in section 6.2.3, this view has found interesting support in empirical studies of music and emotion.
调和形式主义和情感的一种相关方法是关注音乐产生的紧张感和决心感。形式主义者可能会争辩说,这些感觉是音乐内部意义的一部分,因为它们不依赖于与外部世界的概念或现象的心理联系,而是从音乐内部因素中产生的,例如和声句法(参见第 5 章的扩展)音乐句法和张力的讨论,包括实证研究)。Jackendoff (1991) 认为这些感受是音乐影响的重要组成部分。再次值得注意的是,这种形式的影响不同于日常情绪:它是一种动态的感觉,本质上是无价的(Sloboda,1998)。
A related way of reconciling formalism and emotion is to focus on the feelings of tension and resolution that music engenders. A formalist could argue that these feelings are part of intramusical meaning, because they do not depend on mental connection to concepts or phenomena in the outside world, growing instead out of music-internal factors such as harmonic syntax (cf. Chapter 5 for an extended discussion of musical syntax and tension, including empirical studies). Jackendoff (1991) argues that these feelings are an important part of musical affect. Once again, it is worth noting that this form of affect is unlike everyday emotions: It is a dynamic feeling that is essentially unvalenced (Sloboda, 1998).
Kivy (1980, 2002) 和 Davies (1980, 1994) 采取了一种不同的方法将形式主义与情感联系起来,他们认为音乐能够通过其形式表达人类的日常情感(如快乐、悲伤)。将探索形式与情感之间联系的细节在下一节中;目前,相关的一点是这种方法有效地将情感视为一个音乐之外的问题,因为音乐以某种方式让人想起人类日常生活中的情感。
A different approach to linking formalism and emotion is taken by Kivy (1980, 2002) and Davies (1980, 1994), who argue that music is capable of expressing everyday human emotions (such as happiness, sadness) by virtue of its form. The details of the link between form and emotion will be explored in the next section; for the moment, the relevant point is that this approach effectively treats emotion as an extramusical issue, because music somehow brings to mind the emotions of everyday human life.
任何关于情感和音乐的科学讨论都需要在概念上区分音乐表达的情感和听众的情感体验。前者指的是听众判断音乐作品的情感品质(即音乐投射的“情绪”),而后者指的是听众自己对音乐的情感反应。关键是这些是不同的现象(参见 Juslin & Laukka,2004)。例如,听众很可能认为一首曲子“听起来很悲伤”,但对它没有任何情绪反应。本节讲述音乐的情感表达,下一节讲述听众的情感体验。
Any scientific discussion of emotion and music requires a conceptual distinction between the expression of emotion by music and the experience of emotion by listeners. The former refers to the affective qualities of a musical piece as judged by a listener (i.e., the “mood” projected by the music), whereas the latter refers to a listener’s own emotional reaction to music. The crucial point is these are distinct phenomena (cf. Juslin & Laukka, 2004). It is quite possible, for example, for a listener to judge a piece as “sounding sad” but to have no emotional reaction to it. This section deals with the expression of emotion by music, and the next section deals with the experience of emotion by listeners.
当代音乐表达科学研究通常采用以下形式。向听众展示器乐曲目,并要求他们从几个简单的情感类别列表中进行选择——例如快乐、悲伤、愤怒和恐惧——最能捕捉每首曲子所表达的情感的类别(Gabrielsson & Juslin, 1996 年;Krumhansl,1997 年;Balk-will & Thompson,1999 年;Peretz、Gagnon 等人,1998 年;Peretz,2001 年)。也可能会要求听众对乐曲表达所选情感或其他情感的强度进行数值评分。实验者选择他们认为能够体现这些不同类别的曲目,或者使用明确指示表演者以表达一种基本情绪的方式演奏的曲目。(他们也可以通过操纵现有乐曲的音乐参数来创作乐曲,例如速度或调式 [大调/小调]。)这种使用西欧传统音乐的实验已在听众中广泛认同对音乐表现力的判断(详见 Gabrielsson & Lindström,2001 年)。事实上,当类别仅限于两个(例如,快乐与悲伤)并且选择适当的刺激时,成年人可以在听到不到一秒钟的音乐后可靠地判断情感(Peretz, Gagnon, et al., 1998, cf . Bigand et al., 2005),儿童可以在 5 或 6 岁时对扩展段落做出可靠的情感判断 (Gerardi & Gerken, 1995; Dalla Bella et al., 2001)。使用来自西欧传统的音乐,发现听众在判断音乐的表现力方面有广泛的共识(详见 Gabrielsson & Lindström,2001 年)。事实上,当类别仅限于两个(例如,快乐与悲伤)并且选择适当的刺激时,成年人可以在听到不到一秒钟的音乐后可靠地判断情感(Peretz, Gagnon, et al., 1998, cf . Bigand et al., 2005),儿童可以在 5 或 6 岁时对扩展段落做出可靠的情感判断 (Gerardi & Gerken, 1995; Dalla Bella et al., 2001)。使用来自西欧传统的音乐,发现听众在判断音乐的表现力方面有广泛的共识(详见 Gabrielsson & Lindström,2001 年)。事实上,当类别仅限于两个(例如,快乐与悲伤)并且选择适当的刺激时,成年人可以在听到不到一秒钟的音乐后可靠地判断情感(Peretz, Gagnon, et al., 1998, cf . Bigand et al., 2005),儿童可以在 5 或 6 岁时对扩展段落做出可靠的情感判断 (Gerardi & Gerken, 1995; Dalla Bella et al., 2001)。
Contemporary scientific studies of musical expression typically take the following form. Listeners are presented with pieces of instrumental music and are asked to select from a list of a few simple emotional categories—such as happiness, sadness, anger, and fear—the one that best captures the emotions expressed by each piece (Gabrielsson & Juslin, 1996; Krumhansl, 1997; Balk-will & Thompson, 1999; Peretz, Gagnon, et al., 1998; Peretz, 2001). Listeners may also be asked to give a numerical rating of how strongly the piece expresses the chosen emotion or other emotions. The experimenter chooses pieces that they feel exemplify these different categories, or use pieces in which a performer was explicitly instructed to play in a manner that expresses one of the basic emotions. (They may also create pieces by manipulating musical parameters of existing pieces, such as tempo or mode [major/minor].) Experiments of this sort, using music from Western European tradition, have found broad agreement among listeners in judging expressive qualities in music (see Gabrielsson & Lindström, 2001, for a thorough review). Indeed, when the categories are restricted to just two (e.g., happy vs. sad) and the stimuli are chosen appropriately, adults can reliably judge affect after hearing less than a second of music (Peretz, Gagnon, et al., 1998, cf. Bigand et al., 2005), and children can make reliable affective judgments of extended passages by the age of 5 or 6 (Gerardi & Gerken, 1995; Dalla Bella et al., 2001).
尽管此类研究中情感判断的一致性令人印象深刻,但有人可能会反对说,就音乐的情感微妙性而言,只关注少数情感类别的研究不太现实。解决这个问题的一种方法是为听众提供更大范围的情感术语以供选择。赫夫纳 (Hevner, 1936, 1937) 在这方面开创了一个重要的先例:她列出了近 70 个形容词,听众可以检查这些形容词以表明乐曲的情感特征。她把这些形容词分成八组创建一个“形容词圆圈”,其中集群之间的相关性通过沿圆圈的距离反映(图 6.1)。
Although the consistency of affective judgments in such studies is impressive, one might object that research that focuses on only a few emotion categories is not very realistic in terms of the affective subtleties of music. One way to address this problem is to offer listeners a greater range of emotion terms to choose from. Hevner (1936, 1937) set an important precedent in this regard: She had a list of nearly 70 adjectives that listeners could check to indicate the affective characteristics of a piece. She arranged these adjectives into eight clusters to create an “adjective circle” in which relatedness between clusters was reflected by distance along the circle (Figure 6.1).
Gregory 和 Varney(1996)也使用了一个广泛的列表,并表明听众为不同类型的音乐选择了不同的形容词,而不是对所有音乐使用相同的基本形容词(参见 Nielzén 和 Cesarec,1981)。也许同时给予听众选择自由同时也在某种程度上限制选择(以促进实证分析)的最好方法是进行初步研究,听众提供他们自己的形容词,然后使用基于经常选择的形容词(参见 Gregory,1995)。
Gregory and Varney (1996) also used an extensive list, and showed that listeners chose different adjectives for different types of music, rather than using the same basic adjectives for all music (cf. Nielzén & Cesarec, 1981). Perhaps the best way to simultaneously give listeners freedom of choice while also constraining choices to some degree (to facilitate empirical analysis) is to conduct a preliminary study in which listeners supply their own adjectives, and then to conduct a second experiment using rating scales based on frequently chosen adjectives (cf. Gregory, 1995).
对只向听众提供一些简单的情感术语来描述音乐的研究持谨慎态度还有另一个原因。以严肃的方式研究过表达的音乐学家通常会提出比“快乐”和“悲伤”等简单标签更微妙的情感类别。例如,在他广为流传的著作《音乐的语言》中,Cooke (1959) 认为,西欧调性音乐的作曲家共享一个包含 16 种音乐人物的“词汇表”,这些人物表达了不同类型的情感。例如,他认为小调 5-6-5 模式(其中数字表示音阶)给出了“一阵痛苦的效果。这是所有音乐语言术语中使用最广泛的:任何时期任何调性作曲家的“悲伤”音乐,如果不多次遇到,几乎找不到一页”(第 146 页)。库克以 14 世纪比利时香颂、伯德的“圣母摇篮之歌”、莫扎特的唐璜、贝多芬的费德里奥等例子来支持这一说法,等等。这个例子说明了 Cooke 识别音乐人物的一般方法,然后在历史时期展示了一系列令人印象深刻的例子,表达了(根据 Cooke 的)相似的情感意义。
There is another reason to be cautious about studies that offer listeners only a few simple emotion terms to describe music. Musicologists who have studied expression in a serious way often suggest affective categories that are more nuanced than simple labels such as “happy” and “sad.” For example, in his widely read book The Language of Music, Cooke (1959) argued that composers of Western European tonal music shared a “vocabulary” of 16 musical figures that expressed different types of affect. For example, he argued that the pattern 5-6-5 in minor (in which the numbers indicate scale degrees) gives “the effect of a burst of anguish. This is the most widely used of all terms of musical language: one can hardly find a page of ‘grief’ music by any tonal composer of any period without encountering it several times” (p. 146). Cooke supported this claim with examples from a 14th-century Belgian chanson, from the “Virgin’s Cradle Song” by Byrd, from Mozart’s Don Giovanni, Beethoven’s Fidelio, and so forth. This example illustrates Cooke’s general approach of identifying a musical figure and then presenting an impressive array of examples over historical time, expressing (according to Cooke) similar affective meaning.
库克认为,他的“基本情感词汇”的情感联想植根于数字的音调特性(它们特定的音程和谐波含义),并且经过几个世纪的一致使用,它们成为作曲家和听众的常规和广泛理解音调音乐。一项针对现代听众的感知研究未能支持库克的假设 (Gabriel, 1978),但应该指出,由于方法论问题,这项研究可能不是一个完全公平的测试 (Sloboda, 1985:62-64)。尽管库克的思想一直是许多哲学家和音乐理论家最喜欢批评的目标,但西欧听众对他所调查的音乐人物所表达的情感的认同程度如何,仍然是一个悬而未决的问题。5个然而,就目前而言,最重要的一点是库克是一位深思熟虑的人文主义者的例子,他研究了音乐表达,发现需要远远超过四五个基本术语才能充分描述音乐表达的情感。
Cooke felt that the affective associations of his “basic emotional vocabulary” were rooted in the tonal properties of the figures (their particular intervals and harmonic implications), and that they became conventional and widely understood by both composers and listeners via centuries of consistent use in tonal music. One perceptual study with modern listeners has failed to support Cooke’s hypothesis (Gabriel, 1978), though it should be noted that this study may have not have been an entirely fair test due to methodological issues (Sloboda, 1985:62-64). Although Cooke’s ideas have been a favorite target of criticism by many philosophers and music theorists, it remains an open question the extent to which Western European listeners would agree on the emotions expressed by the musical figures he investigated.5 For the moment, however, the salient point is that Cooke is an example of a thoughtful humanist who has studied musical expression and finds that far more than four or five basic terms are needed to adequately describe the emotions expressed by music.
图 6.1赫夫纳形容词圈。每个集群中带下划线的术语被 Hevner 选择为关键术语,广泛代表每个集群中的形容词。来自 Gabrielsson & Lindström,2001 年和 Farnsworth,1954 年。
Figure 6.1 The Hevner adjective circle. The underlined term in each cluster was chosen by Hevner as a key term, broadly representing the adjectives in each cluster. From Gabrielsson & Lindström, 2001, and Farnsworth, 1954.
虽然认识到仅基于几个预先确定的情感类别的表达研究的局限性,但我们将在这里重点关注此类研究,因为它们在分析音乐表现力线索方面是最复杂的。也就是说,他们简化了一个问题(情感类别的数量),以便将精力集中在另一个复杂的问题上:感知表达能力与特定声学特性之间的关系。这使得取得了良好的进展,人们希望未来的工作能够扩大类别的数量,以便研究在概念和技术上都更加复杂。
Although recognizing the limitations of studies of expression based on just a few predetermined emotional categories, we will focus on such studies here because they have been the most sophisticated in terms of analyzing cues to musical expressiveness. That is, they have simplified one problem (the number of emotional categories) in order to focus efforts on another complex problem: the relation between perceived expressiveness and specific acoustic properties. This has allowed good progress to be made, and one hopes that future work will expand the number of categories so that research can be both conceptually and technically sophisticated.
一些已被证明对音乐表达的感知很重要的线索是速度、音高音域和音色(见专栏 6.1Balkwill & Thompson, 1999, p. 中的参考文献 48,和比照。Gabrielsson & Lindström, 2001,表 10.2 了解更多详情)。例如,节奏快、平均音高高、音色明亮的音乐更容易被识别为表达“快乐”而不是“悲伤”,反之亦然。与这些不依赖于西方音乐特定音调结构的提示相反,其他与情感相关的提示确实反映了西欧音乐系统的特殊性。其中最明显的是大调和积极情绪、小调和消极情绪之间的传统联系,西方文化中甚至非常年幼的孩子对这种联系都很敏感(Gerardi & Gerken,1995)。Gabrielsson 和 Lindström (2001) 强调的一个重点是,
Some of the cues that have proven to be important in the perception of musical expression are tempo, pitch register, and timbre (see Box 6.1 and references in Balkwill & Thompson, 1999, p. 48, and cf. Gabrielsson & Lindström, 2001, Table 10.2 for more detail). For example, music with a fast tempo, high average pitch, and bright timbre is much more likely to be identified as expressing “happiness” than “sadness,” and vice versa. In contrast to these cues, which do not depend on the particular tonal structure of Western music, other affectively relevant cues do reflect the peculiarities of the Western European musical system. The most obvious of these is the conventional link between major keys and positive emotions, and minor keys and negative emotions, a link to which even very young children in Western culture are sensitive (Gerardi & Gerken, 1995). One important point, emphasized by Gabrielsson and Lindström (2001), is that the influence of any given factor depends on what other factors are present, and a good deal more work is needed to study how factors interact in the perception of expressive qualities.
证明特定音乐线索与音乐情绪判断之间存在可靠联系是音乐表现力科学研究的重要一步,但这只是第一步。从认知的角度来看,有两个问题尤为突出。首先,每个观察到的链接的心理基础是什么?其次,这些跨音乐文化的联系有多普遍?
Demonstrating that there are reliable links between particular musical cues and judgments of musical mood is an important step in the scientific study of musical expressiveness, but it is just the first step. From a cognitive perspective, two questions are especially salient. First, what is the mental basis for each observed link? Second, how universal are these links across musical cultures?
在回答第一个问题时,一个明显的候选者是言语。听众非常擅长判断声音的情感品质,换句话说,判断它表达的情绪与单词的词汇意义无关。有大量关于声音影响的声学线索的研究,其中一些发现与音乐影响感知研究中的发现非常相似(Johnstone 和 Scherer,2000 年)。例如,速度和平均音调高度是影响语音和音乐的重要线索:悲伤的声音和音乐段落可能比快乐的声音更慢、音调更低。语音和音乐中的情感线索之间的联系一直是近年来研究工作的重点,并将在第 6.5.1 节中更详细地讨论. 现在,只要说语言可能是音乐表现力的一个重要来源就够了。
In answering the first question, one obvious candidate is speech. Listeners are quite good at judging the affective qualities of a voice, in other words, judging the mood it expresses independent of the lexical meanings of the words. There is a sizable body of research on the acoustic cues to vocal affect, and some of the findings are quite parallel to those found in research on perception of musical affect (Johnstone & Scherer, 2000). For example, tempo and average pitch height are important cues to affect in both speech and music: Sad voices and musical passages are likely to be slower and lower pitched than happy ones. The link between cues to emotion in speech and music has been the focus of research effort in recent years, and will be discussed in more detail in section 6.5.1. For now, suffice it to say that language may be one important source of music’s expressive power.
Box 6.1 Some Cues Associated With Judgments of Musical Expressiveness in Western European Tonal Music
当然,声音提示和感知表达之间的联系还有其他可能的基础。例如,Kivy (1980) 和 Davies (1980) 都认为音乐模式可以类似于处于不同情绪状态的人的身体姿势或动作。因此,例如,缓慢、下垂的音乐主题可能会从它们类似于抑郁症患者的身体运动的方式中获得它们的表现力(参见 Clynes,1977 年;Damasio,2003 年)。根据这种观点,某些线索仅仅因为它们与人类的举止或表情相似而被体验为具有表现力。Davies (2003:181) 提供了巴吉度猎犬面部的类比,它通常被认为是悲伤的表现,不是因为狗本身是悲伤的,而是因为它的面相在某些方面类似于悲伤的人的面容。
There are, of course, other possible bases for links between acoustic cues and perceived expression. Both Kivy (1980) and Davies (1980), for example, suggest that musical patterns can resemble the physical bearing or movements of people in different emotional states. Thus, for example, slow, sagging musical themes might get their expressive power from the way they resemble the physical movements of a depressed person (cf. Clynes, 1977, Damasio, 2003). According to this view, certain cues are experienced as expressive simply by virtue of their resemblance to human bearings or expressions. Davies (2003:181) offers the analogy of the face of a basset hound, which is often perceived as expressive of sadness not because the dog itself is sad, but because its physiognomy resembles in some ways the visage of a sad person.
除了与声音影响和身体形象的联系外,音乐表现力的另一个可能来源是结构和情感之间的隐喻关系。例如,实证研究表明,简单的旋律结构通常与快乐的判断相关,复杂的结构与悲伤的判断相关,至少在西方听众中是这样(Nielzén & Cesarec, 1981; Balkwill & Thompson, 1999)。很难将音乐的这些方面与声音影响或人类举止的各个方面联系起来。相反,它们可能反映了一种隐喻理解的形式,幸福被认为是一种更简单的精神状态,而悲伤被认为是更复杂和多方面的(正如托尔斯泰的安娜卡列尼娜著名的开场白所暗示的那样)。
In addition to links with vocal affect and body image, another possible source of expressiveness in music is a metaphorical relation between structure and emotion. For example, empirical research has shown that simple melodic structure is often linked with judgments of happiness, and complex structure with judgments of sadness, at least in Western listeners (Nielzén & Cesarec, 1981; Balkwill & Thompson, 1999). It is difficult to relate these facets of music to vocal affect or to aspects of human comportment. They may instead reflect a form of metaphorical understanding whereby happiness is considered a simpler mental state, whereas sadness is considered more complex and multifaceted (as intimated in the famous opening line of Tolstoy’s Anna Karenina).
上面提出的第二个问题,关于特定线索和特定情绪之间联系的普遍性,实际上是两个独立的问题。首先,为西欧调性音乐提出的情感类别是否适用于跨文化?其次,如果可以找到其他文化,在这些文化中谈论表达悲伤、快乐等的音乐是有意义的,那么音乐线索和感知情感之间的映射有多一致?例如,人们是否会发现一种文化,在这种文化中,慢速与快乐音乐相关联,而快节奏与悲伤音乐相关联?关于第一个问题,众所周知,存在非西方音乐传统,认为器乐可以具有表现力。一个经过充分研究的案例是印度古典音乐,其中不同的拉格(音乐作品以特定的音阶、音调等级和旋律手势为特征)被认为可以表达不同的特征情绪,或“rasas”(贝克尔,2001 年)。另一个例子是爪哇加麦兰音乐,它也认识到不同的作品可以有截然不同的情感内容。最近,Benamou (2003) 对每种文化的专家对西方音乐和爪哇音乐的情感术语进行了有趣的比较分析。对于西方音乐,他采用了赫夫纳形容词圈中的八个关键术语,例如“庄严”、“悲伤”、“平静”(见图 6.1 )). 对于爪哇音乐,他提出了一组六个爪哇情感术语,这些术语源自与爪哇音乐专家的实地工作。在某些情况下,西方和爪哇语的情感术语之间存在相当好的语义映射:例如,爪哇音乐情感术语“regu”和“sedih”的含义分别包括“尊严”和“悲伤”。然而,在其他情况下,术语并没有以直接的方式从一种文化映射到另一种文化。Benamou 的工作提醒我们,不能简单地假设音乐情感类别在不同文化中是统一的。只有那些由不同文化共享的情感类别才适合解决音乐线索和音乐表现力之间联系的一致性。
The second question posed above, about the universality of links between particular cues and particular emotions, is actually two independent questions. First, do the emotional categories that have been proposed for Western European tonal music apply cross-culturally? Second, if other cultures can be found in which it makes sense to talk about music expressing sadness, happiness, and so forth, how consistent is the mapping between musical cues and perceived affect? For example, might one find a culture in which slow tempi are associated with happy music and fast tempi with sad music? With regard to the first question, it is known that non-Western musical traditions exist that posit that instrumental music can have expressive qualities. One well-studied case is that of Indian classical music, in which different ragas (musical compositions characterized by a particular scale, tonal hierarchy, and melodic gestures) are claimed to express different characteristic moods, or “rasas” (Becker, 2001). Another case is Javanese gamelan music, which also recognizes that different compositions can have quite different affective content. Recently, Benamou (2003) conducted an interesting comparative analysis of affective terms applied to Western versus Javanese music by experts within each culture. For Western music, he took the eight key terms from Hevner’s adjective circle, such as “dignified,” “sad,” “serene,” (see Figure 6.1). For Javanese music, he proposed a set of six Javanese affect terms derived from field work with musical experts in Java. In some cases, there was a fairly good semantic mapping between Western and Javanese affect terms: For example, the Javanese musical affect terms “regu” and “sedih” include “dignified” and “sadness” in their connotations, respectively. In other cases, however, terms did not map from one culture to another in a straightforward way. Benamou’s work alerts us to the fact that one cannot simply assume that musical affective categories are uniform across cultures. Only those affective categories shared by different cultures are appropriate for addressing the consistency of the link between musical cues and musical expressiveness.
幸运的是,这种情感类别确实存在,并且一直是关于表达线索的跨文化研究的重点。Balkwill 及其同事基于在多个国家的实地考察和感知实验开创了这样的研究路线之一。例如,Balkwill 和 Thompson (1999) 研究了西方听众对印度拉格斯情绪的感知。这些听众对印度古典音乐一无所知,他们听到了 12 首拉格,根据印度 rasa 系统,这些拉格旨在表达四种不同的情绪:hasya(喜悦)、karuna(悲伤)、raudra(愤怒)和 shanta(和平)。拉格是在西塔琴或长笛上演奏的(即没有声乐成分)。对于每一种拉格,听众评价哪种情绪表达得最强烈,然后给出四种情绪在拉格中的表达强度的数值等级。他们还对每首曲子的速度、节奏复杂性、旋律复杂性和音高范围进行了评分。结果表明,听众可以识别出预期的情绪是快乐、悲伤还是愤怒,即使他们对印度古典传统还很天真。Balkwill 和 Thompson 使用相关和回归技术表明,快乐的等级与快节奏和低旋律复杂性相关,悲伤的等级与慢节奏和高旋律复杂性相关,而愤怒的等级与音色相关:听众更有可能识别西塔琴(一种弦乐器)的“较硬”音色与长笛较柔和的音色相比,具有愤怒。节奏复杂性、旋律复杂性和音高范围。结果表明,听众可以识别出预期的情绪是快乐、悲伤还是愤怒,即使他们对印度古典传统还很天真。Balkwill 和 Thompson 使用相关和回归技术表明,快乐的等级与快节奏和低旋律复杂性相关,悲伤的等级与慢节奏和高旋律复杂性相关,而愤怒的等级与音色相关:听众更有可能识别西塔琴(一种弦乐器)的“较硬”音色与长笛较柔和的音色相比,具有愤怒。节奏复杂性、旋律复杂性和音高范围。结果表明,听众可以识别出预期的情绪是快乐、悲伤还是愤怒,即使他们对印度古典传统还很天真。Balkwill 和 Thompson 使用相关和回归技术表明,快乐的等级与快节奏和低旋律复杂性相关,悲伤的等级与慢节奏和高旋律复杂性相关,而愤怒的等级与音色相关:听众更有可能识别西塔琴(一种弦乐器)的“较硬”音色与长笛较柔和的音色相比,具有愤怒。尽管他们对印度古典传统很天真。Balkwill 和 Thompson 使用相关和回归技术表明,快乐的等级与快节奏和低旋律复杂性相关,悲伤的等级与慢节奏和高旋律复杂性相关,而愤怒的等级与音色相关:听众更有可能识别西塔琴(一种弦乐器)的“较硬”音色与长笛较柔和的音色相比,具有愤怒。尽管他们对印度古典传统很天真。Balkwill 和 Thompson 使用相关和回归技术表明,快乐的等级与快节奏和低旋律复杂性相关,悲伤的等级与慢节奏和高旋律复杂性相关,而愤怒的等级与音色相关:听众更有可能识别西塔琴(一种弦乐器)的“较硬”音色与长笛较柔和的音色相比,具有愤怒。
Fortunately, such affective categories do exist and have been the focus of cross-cultural research on cues to expression. One such line of research has been pioneered by Balkwill and colleagues, based on fieldwork in multiple countries and on perceptual experiments. For example, Balkwill and Thompson (1999) examined Western listeners’ perception of mood in Indian ragas. These listeners had no knowledge of Indian classical music, and heard 12 ragas that were meant to express four different emotions according to the Indian rasa system: hasya (joy), karuna (sadness), raudra (anger), and shanta (peace). The ragas were performed on sitar or flute (i.e., there was no vocal component). For each raga, listeners rated which emotion was expressed the most strongly, and then gave a numerical rating of how strongly each of the four emotions was expressed in that raga. They also rated each piece for tempo, rhythmic complexity, melodic complexity, and pitch range. The results revealed that listeners could identify the intended emotion when it was joy, sadness, or anger, even though they were naive with respect to the Indian classical tradition. Using correlation and regression techniques, Balkwill and Thompson showed that ratings of joy were associated with fast tempo and low melodic complexity, ratings of sadness were associated with slow tempo and high melodic complexity, and ratings of anger were related to timbre: Listeners were more likely to identify the “harder” timbre of the sitar (a stringed instrument) with anger than the softer timbre of the flute.
Balkwill 和 Thompson (1999) 主张区分音乐表现力的“心理物理”和“特定文化”线索。他们将他们的发现视为心理物理表达线索的跨文化性质的证据。相比之下,对特定文化线索的敏感性大概需要在特定的音调系统中适应。这可以预测,例如,西方音乐中大调/小调区别的情感质量只有那些听着这种音乐长大并且在这些模式和表达内容之间形成关联的人(例如,通过听歌其中带有情感内容的单词与音乐配对)。如上所述,与言语中的情感线索的联系可能是某些音乐线索在不同文化中以一致的方式被感知的原因之一。第 6.5 节将更详细地讨论该主题。
Balkwill and Thompson (1999) argue for a distinction between “psychophysical” and “culture-specific” cues to musical expressiveness. They regard their findings as evidence for the cross-cultural nature of psychophysical cues to expression. In contrast, sensitivity to culture-specific cues presumably requires enculturation in a particular tonal system. This would predict, for example, that the affective quality of the major/minor distinction in Western music is evident only to those who have grown up listening to this music and who have formed associations between these modes and expressive content (e.g., via hearing songs in which words with affective content are paired with music). As noted above, a link with affective cues in speech might be one reason that certain musical cues to affect are perceived in a consistent way across cultures. This topic is discussed in more detail in section 6.5.
一些关于音乐的最古老的书面观察涉及它对情绪的影响。2000多年前,柏拉图在《理想国》中宣称不同调式的旋律会激起不同的情绪,并认为这对道德发展的影响如此之大,以至于社会应该禁止某些形式的音乐(参见 Holloway,2001 年)。许多其他作家都肯定了音乐的情感力量,并且有充分的证据证明情感在许多当代听众的音乐体验中的重要性 (Juslin & Sloboda, 2001)。仅举一个例子,对音乐在日常生活中的作用的心理学研究表明,听音乐通常被用作调节情绪的一种方式(Denora,1999 年;Sloboda 和 O'Neill,2001 年)。鉴于音乐和情绪反应之间普遍存在的联系,令人惊讶的是,对该主题的科学研究是一项相对年轻的尝试。无疑,造成这种情况的一个原因是以定量和可靠的方式衡量情绪的挑战:与音乐相关的情绪可能不会产生任何明显的行为后果,可能转瞬即逝且难以归类,并且在听相同音乐的个人之间可能会有很大差异。尽管这些挑战使音乐情感的实证研究变得困难,但并非不可能,正如研究人员不断发表的出版物所证明的那样,他们寻求创造性的方法来挖掘听众每时每刻的情感体验。
Some of the oldest written observations about music concern its power over the emotions. More than 2,000 years ago, Plato claimed in The Republic that melodies in different modes aroused different emotions, and argued that this was such a strong influence on moral development that society should ban certain forms of music (cf. Holloway, 2001). Many other writers have affirmed music’s emotional power, and there is ample evidence for the importance of emotion in the musical experience of many contemporary listeners (Juslin & Sloboda, 2001). To take just one example, psychological studies of music’s role in everyday life reveal that listening to music is often used as a way of regulating mood (Denora, 1999; Sloboda & O’Neill, 2001). Given the pervasive link between music and emotional response, it may come as a surprise that the scientific study of this topic is a relatively young endeavor. Undoubtedly, one reason for this is the challenge of measuring emotion in a quantitative and reliable fashion: Emotions associated with music may not have any obvious behavioral consequence, can be fleeting and hard to categorize, and can vary substantially between individuals listening to the same music. Although these challenges make the empirical study of musical emotion difficult, they do not make it impossible, as evidenced by a growing stream of publications from researchers who have sought creative ways of tapping into the moment-by-moment emotional experience of listeners.
在这个简短的部分中,不可能提供对音乐情感实证研究的全面回顾(参见 Juslin & Sloboda,2001)。因此,我转而关注一个关于认知兴趣的问题:听音乐时体验到的情绪是否与人类生活中的日常情绪(例如,快乐、悲伤、愤怒)相同,或者音乐是否会引发与预先设定的情绪不完全吻合的情绪?日常类别(参见 Scherer,2004 年)?要回答这个问题,首先需要提供一份日常情绪清单,换句话说,这些情绪是生活中正常事务的一部分,并且它们在人类生存中的基本作用已通过跨文化存在得到证实。情绪研究人员根据不同的标准提供了不同的列表(尽管应该指出的是,一些研究人员对基本情绪的概念持批评态度,例如,Ortony & Turner,1990)。基于万能面部表达式,Ekman 等人。(1987) 提出了以下列表:愤怒、厌恶、恐惧、喜悦、悲伤和惊讶。其他研究人员将羞耻、温柔和内疚等情绪列入了他们的清单。出于本节的目的,我将快乐、悲伤、愤怒和恐惧视为人类的基本情感。音乐能引发这些情绪吗?如何科学地确定听众是否确实在经历这些情绪?6个
In this brief section, it is impossible to provide a thorough review of empirical studies of emotion in music (see Juslin & Sloboda, 2001). Hence I focus instead on one question of cognitive interest: Are the emotions experienced during music listening the same as the everyday emotions of human life (e.g., happiness, sadness, anger), or does music elicit emotions that do not fit neatly into the preestablished everyday categories (cf. Scherer, 2004)? To answer this question, one needs to first provide a list of everyday emotions, in other words, emotions that are part of the normal business of living life, and whose basic role in human existence is attested by their presence cross-culturally. Emotion researchers have offered different lists based on different criteria (though it should be noted that some researchers are critical of the idea of basic emotions altogether, for example, Ortony & Turner, 1990). Based on universal facial expressions, Ekman et al. (1987) have proposed the following list: anger, disgust, fear, joy, sadness, and surprise. Other researchers have included emotions such as shame, tenderness, and guilt in their lists. For the purposes of this section, I will take happiness, sadness, anger, and fear as basic human emotions. Can music elicit these emotions? How can one determine scientifically if in fact a listener is experiencing these emotions?6
在一项相关研究中,Krumhansl (1997) 让参与者听了 6 段 3 分钟的器乐片段,这些片段被选来代表悲伤、恐惧或快乐。(例如,阿尔比诺尼的G小调柔板用于表达悲伤,穆索尔斯基的《荒山之夜》用于表达恐惧,维瓦尔第的《四季》中的《春》为快乐。)一组参与者听了所有的曲目并给它们打分。另一组将所有作品评为快乐,第三组将所有作品评为恐惧。至关重要的是,参与者被指示判断他们自己对这些作品的情绪反应,而不是音乐所表达的情绪。他们通过在聆听的同时移动计算机屏幕上的滑块来表明他们以连续的方式感受到目标情绪的强烈程度。Krumhansl 发现,根据参与者的评分,音乐选择确实产生了预期的情绪。
In a relevant study, Krumhansl (1997) had participants listen to six 3-minute instrumental musical excerpts chosen to represent sadness, fear, or happiness. (For example, Albinoni’s Adagio in G minor was used for sadness, Mussorgsky’s Night on Bare Mountain for fear, and Vivaldi’s “La Primavera” from The Four Seasons for happiness.) One group of participants listened to all pieces and rated them for sadness; another group rated all pieces for happiness, and a third group rated the pieces for fear. Crucially, participants were instructed to judge their own emotional reactions to the pieces, not the emotion expressed by the music. They indicated how strongly they felt the target emotion in a continuous fashion by moving a slider on a computer screen while listening. Krumhansl found that the musical selections did produce the intended emotions according to participant ratings.
除了这项行为研究之外,Krumhansl 还进行了一项生理研究,其中一组单独的参与者听了相同的曲目但没有做出任何行为反应。取而代之的是,他们在听音乐的过程中监测了他们生理的十几个不同方面,包括心跳间隔、呼吸深度和皮肤电导。许多这些措施反映了自主神经系统(交感神经系统)的激活程度。此外,生理参数根据主要情绪发生变化(例如,心跳间隔最低——即,心跳最快——对于快乐的摘录),这些变化与情绪相关的生理变化相当(但不完全)一致其他使用动态媒体的非音乐背景,例如电影和广播剧(cf. Nyklícek 等人,1997 年)。总的来说,Krumhansl 的结果表明,器乐可以唤起日常情绪,并提出了关于这种情况发生的心理机制的有趣问题。特别是,尽管众所周知音乐可以通过使用特定提示来表达不同的情绪(参见第 6.2.2 节),为什么这会导致人们以相似的情绪做出反应(相对于“距离感”的情绪感知)?这种反应是否反映了人类同理心的能力?或者它是传染的一个例子,换句话说,是一种情感反应的被动传播(参见 Provine,1992)?
In addition to this behavioral study, Krumhansl also conducted a physiological study in which a separate group of participants listened to the same pieces but gave no behavioral responses. Instead, a dozen different aspects of their physiology were monitored during music listening, including cardiac interbeat interval, respiration depth, and skin conductance. Many of these measures reflected the amount of activation of the autonomic (sympathetic) nervous system. Furthermore, physiological parameters changed depending on the dominant emotion (e.g., cardiac interbeat interval was lowest—i.e., heartbeat was fastest—for the happy excerpts), and these changes were fairly (but not totally) consistent with physiological changes associated with emotions elicited in other, nonmusical contexts using dynamic media, such as film and radio plays (cf. Nyklícek et al., 1997). Overall, Krumhansl’s results suggest that everyday emotions can be aroused by instrumental music, and raise interesting questions about the psychological mechanisms by which this takes place. In particular, although it is known that music can express different emotions via the use of specific cues (cf. section 6.2.2), why does this lead people to respond with a similar emotion (vs. a “distanced” perception of mood)? Does the reaction reflect something about human capacity for empathy? Or is it an example of contagion, in other words, the passive spreading of an affective response (cf. Provine, 1992)?
接受音乐可以引发日常情绪,是否也可以证明存在与日常生活情绪不同的音乐情绪?许多哲学家都提出了这样的观点,包括 Susanne Langer (1942),她认为“作曲家不仅表明,而且阐明了语言无法命名,更不用说阐述的微妙情感复合体”(第 222 页) . 最近,拉夫曼 (Raffman, 1993) 也主张“无法言喻”的音乐感受,与音乐中张力和决心的流动有关。7心理学家还担心音乐情感可能没有被非音乐情感研究中通常使用的少数类别充分捕捉到。Scherer (2004) 警告说,将音乐情感判断强加于“一组预先确定的类别的紧身胸衣”是危险的(第 240 页)。其他心理学家和音乐学家认为,虽然音乐可以唤起日常情绪,但音乐情绪的另一个方面更加个人化和复杂,也许最终更为重要。这是听众在“情感建构”过程中积极使用音乐,换句话说,在创造一种情感立场,帮助定义他们对自己生活各个方面的态度(Denora,2001 年;Sloboda 和 Juslin,2001 年,cf.库克和迪本,2001 年)。
Accepting that music can elicit everyday emotions, is it also possible to demonstrate that there are musical emotions that are not simple counterparts of the emotions of everyday life? Such a view has been proposed by a number of philosophers, including Susanne Langer (1942), who argues that “a composer not only indicates, but articulates subtle complexes of feeling that language cannot name, let alone set forth” (p. 222). More recently, Raffman (1993) has also argued for “ineffable” musical feelings, related to the flow of tension and resolution in music.7 Psychologists have also been concerned with the possibility that musical emotions are not adequately captured by the few categories typically employed in nonmusical emotion research. Scherer (2004) has cautioned about the dangers of forcing musical emotion judgments into a “corset of a pre-established set of categories” (p. 240). Other psychologists and musicologists have argued that although music can evoke everyday emotions, there is another aspect to musical emotion that is more personal and complex, and perhaps ultimately more important. This is listeners’ active use of music in a process of “emotional construction,” in other words, in creating an emotional stance that helps define their attitude toward aspects of their own life (Denora, 2001; Sloboda & Juslin, 2001, cf. Cook & Dibben, 2001). This “constructive” view of musical emotion is relatively young and presents significant challenges for empirical research, but is worthy of the effort.
是否有任何科学证据表明音乐可以唤起不同于日常情绪的情绪?例如,考虑一下因聆听美妙的音乐作品而引起的钦佩感或惊奇感。直觉上,这些情绪肯定与我们在听演讲时体验到的日常情绪不同(参见 Davies,2002 年;Gabrielsson 和 Lindström Wik,2003 年)。虽然很难为这种直觉提供科学依据,但有一个相关现象可能会为音乐而非普通言语引发的情绪提供一些支持。这涉及一种被称为“发冷”或“脊椎发冷”的生理反应,许多人在听音乐时都曾经历过这种反应。(音乐是最常见的寒战来源之一,但人们也会对其他刺激感人的刺激感到寒战,包括视觉和文学艺术;Goldstein, 1980.) Sloboda (1991) 对 83 名听众进行了一项调查,其中大约一半是受过训练的音乐家,调查了寒战的发生和对音乐的其他生理反应。相当多的受访者经历过寒战,包括纯粹的器乐曲目。斯洛博达要求参与者尽可能准确地识别哪些特定的曲目和段落让他们感到寒战或发生其他反应。参与者通常能够非常精确地定位那些与生理反应相关的区域(有趣的是,这些反应经常被注意到非常可靠地发生,换句话说,即使在听了几十次之后也不会习惯)。通过音乐分析,斯洛博达能够证明音乐结构与生理反应之间存在关联。例如,发冷通常与和谐的突然变化有关,这可以被视为违反预期 (Meyer, 1956)。8个
Is there any scientific evidence that music can evoke emotions distinct from everyday emotions? Consider, for example, the sense of admiration or wonder evoked by a hearing a musical work of great beauty. Intuitively, these emotions certainly seem different from the everyday emotions we experience when listening to speech (cf. Davies, 2002; Gabrielsson & Lindström Wik, 2003). Although providing a scientific basis for this intuition is difficult, there is a related phenomenon that may offer some support for emotions induced by music but not by ordinary speech. This concerns a physiological response known as “chills” or “shivers down the spine,” which many people have experienced when listening to music. (Music is among the most common source of chills, but people can also get chills to other stimuli experienced as deeply moving, including visual and literary art; Goldstein, 1980.) Sloboda (1991) conducted a survey of 83 listeners, about half of whom were trained musicians, examining the occurrence of chills and other physiological responses to music. A substantial number of respondents had experienced chills, including to purely instrumental pieces. Sloboda asked the participants to identify as accurately as they could the particular pieces and passages in which chills or other responses occurred. The participants were often able to locate quite precisely those regions associated with physiological responses (interestingly, these responses were often noted to occur quite reliably, in other words, without habituation even after dozens of listenings). Via musical analysis, Sloboda was able to show that there was a relationship between musical structure and physiological response. Chills, for example, were often associated with sudden changes in harmony, which can be regarded as a violation of expectancy (Meyer, 1956).8
发冷很有趣,因为它们显然是一种情绪反应,但与快乐或悲伤等日常情绪不同。日常情绪通常被认为至少有两个心理维度:效价维度(积极与消极)和唤醒维度(更多与更少唤醒;Russell,1989)。发冷也许可以解释为觉醒的短暂增加,但它们似乎没有任何内在效价(即,它们可能与“快乐”或“悲伤”音乐一起出现,这表明任何对效价的解释都是事后的和上下文相关)。9此外,发冷通常不是日常情绪的生理伴随物,一些完全正常的人根本不会体验到这一点(Goldstein,1980)。
Chills are interesting because they are clearly an emotional response, but they do not resemble the everyday emotions such as happiness or sadness. Everyday emotions are generally recognized as having at least two psychological dimensions: a dimension of valence (positive vs. negative) and a dimension of arousal (more vs. less aroused; Russell, 1989). Chills can perhaps be interpreted as a transient increase in arousal, but they do not seem to have any intrinsic valence (i.e., they can occur with “happy” or “sad” music, suggesting that any interpretation in terms of valence is post hoc and context dependent).9 Furthermore, chills are not normally a physiological concomitant of everyday emotions, as evidenced by the fact that some perfectly normal people do not experience them at all (Goldstein, 1980).
研究寒战是否与日常情绪不同的一种科学方法是确定大脑对寒战的反应是否与对快乐和悲伤等情绪的反应不同。Blood 和 Zatorre (2001) 进行了一项关于发冷的神经研究,在该研究中,听众听到了自选的、诱发发冷的器乐,同时用正电子发射断层扫描 (PET) 测量了不同区域的大脑活动。(发冷的发生通过心率和呼吸等生理指标得到证实。)活动与发冷呈正相关的大脑区域包括与奖赏和动机相关的深部脑结构,包括腹侧纹状体和背侧中脑(参见 Menon &列维京,2005)。众所周知,这些相同的区域对其他具有生物学意义的刺激(包括食物和性)反应活跃,并且会成为某些滥用药物的目标。这些引人入胜的发现表明,音乐可以参与通常涉及生物学重要功能的进化上古老的大脑结构。不幸的是,Blood 和 Zatorre 也没有测量大脑对音乐引发的“日常情绪”的反应,因此目前无法直接比较与寒战和日常情绪相关的大脑区域。然而,有 Blood 和 Zatorre 也没有测量大脑对音乐引发的“日常情绪”的反应,因此目前无法直接比较与发冷和日常情绪相关的大脑区域。然而,有 Blood 和 Zatorre 也没有测量大脑对音乐引发的“日常情绪”的反应,因此目前无法直接比较与发冷和日常情绪相关的大脑区域。然而,有现在有大量关于日常情绪的神经相关性的工作(例如,Damasio,1994 年,2003 年;LeDoux,1996 年;Panksepp,1998 年;Lane 和 Nadel,2000 年),并且一项涉及直接比较这里设想的类型的实验应该有可能。
One scientific way to study whether chills are distinct from everyday emotions is to determine if brain responses to chills engage different brain regions than responses to emotions such as happiness and sadness. Blood and Zatorre (2001) conducted a neural study of chills in which listeners heard self-selected, chill-inducing instrumental music while brain activity in different regions was measured with positron emission tomography (PET). (The occurrence of chills was confirmed by physiological measures such as heart rate and respiration.) The brain regions where activity was positively correlated with chills included deep brain structures associated with reward and motivation, including the ventral striatum and dorsal midbrain (cf. Menon & Levitin, 2005). These same areas are known to be active in response to other biologically rewarding stimuli, including food and sex, and to be targeted by certain drugs of abuse. These fascinating findings revealed that music can engage evolutionarily ancient brain structures normally involved in biologically vital functions. Unfortunately, Blood and Zatorre did not also measure brain responses to “everyday emotions” elicited by music, so that no direct comparison of brain areas associated with chills versus everyday emotions is currently available. However, there is now a good deal of work on the neural correlates of everyday emotions (e.g., Damasio, 1994, 2003; LeDoux, 1996; Panksepp, 1998; Lane & Nadel, 2000), and an experiment involving a direct comparison of the type envisioned here should be possible.
让我们暂时假设寒战在神经上与日常情绪截然不同,这似乎很有可能。这对语言和音乐之间的关系意味着什么?如前一节所述,听众对声音的情感品质很敏感(与语义内容无关),但声音可表达的情感类型似乎仅限于日常情绪,如快乐和愤怒。因此,尽管言语可能能够引起听众的情感反应(例如,通过同理心),但尚不清楚它是否能引起相当于“寒战”的反应。(语义内容说话的语气,再加上声音的情感/修辞质量,会导致寒战。例如,许多人在听到小马丁·路德·金著名的“我有一个梦想”演讲时感到不寒而栗。然而,这里我们讨论的是独立于词义的声音的情感品质。)10如果普通讲话的声音不能唤起寒意,这就意味着音乐声音可以唤起言语声音不能唤起的情绪。
Let us assume for the moment that chills are neurally distinct from everyday emotions, as seems likely. What would this imply with regard to the relationship between language and music? As noted in the previous section, listeners are sensitive to the affective qualities of the voice (independent of semantic content), but the types of affect expressible by the voice seem to be limited to everyday emotions such as happiness and anger. Thus although speech may be able to elicit an affective response from a listener (e.g., via empathy), it is not clear if it can elicit a response equivalent to “chills.” (The semantic content of speech, in combination with affective/rhetorical qualities of the voice, can lead to chills. For example, many people report getting chills when listening to the famous “I Have a Dream” speech by Martin Luther King, Jr. However, here we are discussing the affective qualities of the voice independent of the lexical meaning of the words.)10 If the sound of ordinary speech cannot evoke chills, this would imply that musical sounds can evoke emotions that speech sounds cannot.
音乐的一个显着且经常被注意到的特性是它能够唤起听众的运动感(Zohar & Granot,2006)。这方面的一个例子是我们倾向于与音乐节拍同步,这种反应似乎是人类独有的(参见第 3 章和第 7 章)。然而,即使是没有强烈节拍的音乐也会让人联想到动感。Clarke (2001) 在“生态”声音方法的框架内讨论了音乐的这一方面,声音被认为是其来源的特定属性。(该框架基于 JJ Gibson [1979] 的视觉心理学方法;参见 Bregman,1990 年;Windsor,2000 年;Dibben,2001 年;Clarke,2005 年。)Clarke(2001 年)认为,因为日常生活中的声音指定(除其他事项外)其来源的运动特征,“不可避免的是,音乐声音也会指定动作和手势,包括它们实际生产的真实动作和手势。” . . 以及虚构的动作和手势虚拟环境”(第 222 页)。作为一个例子,Clarke 指出了 Alban Berg 的歌剧Wozzeck中一个特定的管弦乐高潮,其中恒定的音高与音色和动态的连续变化相结合,产生了一种“隐约可见”的听觉感,暗示着一种内在的碰撞(参见塞弗里茨等人,2002 年)。
One remarkable and oft-noted property of music is its ability to evoke a sense of motion in a listener (Zohar & Granot, 2006). One example of this is our tendency to synchronize to a musical beat, a response that appears to be uniquely human (cf. Chapters 3 and 7). However, even music without a strong beat can bring to mind a sense of motion. Clarke (2001) discusses this aspect of music in the framework of an “ecological” approach to sound, whereby sounds are perceived as specifying properties of their sources. (This framework is based on J. J. Gibson’s [1979] approach to the psychology of vision; cf. Bregman, 1990; Windsor, 2000; Dibben, 2001; Clarke, 2005.) Clarke (2001) argues that because sounds in everyday life specify (among other things) the motional characteristics of their sources, “it is inevitable that musical sounds will also specify movements and gestures, both the real movements and gestures of their actual physical production . . . and also the fictional movements and gestures of the virtual environment” (p. 222). As one example, Clarke points to a particular orchestral crescendo on Alban Berg’s opera Wozzeck, in which the combination of a constant pitch with a continuous change in timbre and dynamics yields an auditory sense of “looming,” suggestive of an immanent collision (cf. Seifritz et al., 2002).
Clarke 认为音乐可以唤起两种运动:一种自我运动感(正如 Todd,1999 所强调的,他诉诸神经生理学论证)和一种外部物体相对于自我或彼此运动的感觉(参见Friberg & Sundberg,1999 年;珩磨,2005 年)。克拉克思想的一个吸引人的方面是他对实证验证的关注。例如,他指出,测试音乐运动是否是一种感知(相对于隐喻)现象的一种方法是看它是否干扰了需要感知或产生实际运动的任务。
Clarke suggests that music can evoke two kinds of motion: a sense of self-motion (as emphasized by Todd, 1999, who appeals to neurophysiological arguments) and a sense of external objects moving in relation to the self or to one another (cf. Friberg & Sundberg, 1999; Honing, 2005). One appealing aspect of Clarke’s ideas is his concern with empirical validation. For example, he points out that one way to test whether musical motion is a perceptual (vs. a metaphorical) phenomenon is to see if it interferes with tasks that require the perception or production of actual motion.
有趣的是,在语音科学中,还有一种受生态声学启发的理论。“直接现实主义理论”(Fowler,1986 年;Fowler 等人,2003 年)声称语音感知的对象不是声学事件,而是作为声学信号基础的语音结构发音手势。因此,声音作为感知运动的媒介的概念被语音和音乐感知理论所共享,尽管应该注意的是声音和感知运动之间的联系在语音科学中受到相当多的争议(例如, Diehl 等人,2004 年)。
It is interesting to note that within speech science, there is also a theory inspired by ecological acoustics. “Direct realist theory” (Fowler, 1986; Fowler et al., 2003) claims that the objects of speech perception are not acoustic events but phonetically structured articulatory gestures that underlie acoustic signals. Thus the notion of sound as a medium for the perception of movement is shared by theories of speech and music perception, though it should be noted that the link between sound and perceived movement is subject to a fair amount of controversy in speech science (e.g., Diehl et al., 2004).
“音画”或“声画”是指对自然现象的音乐模仿。这些可以包括环境声音、动物声音或人类声音。前两类的例子可以在贝多芬的第六交响曲(“田园”)中找到。在第二乐章中,夜莺和布谷鸟的歌声由长笛和单簧管旋律代表,在第四乐章中,使用管弦乐队的所有声音资源描绘了一场猛烈的雷雨。模仿人声的一个例子是许多作曲家使用的“叹息声”,包括莫扎特在他的 D 小调幻想曲 K.397 中。
“Tone painting” or “sound painting” refers to the musical imitation of natural phenomena. These can include environmental sounds, animal sounds, or human sounds. Examples of the former two categories are found in Beethoven’s Symphony No. 6 (“Pastoral”). In the second movement, the songs of the nightingale and cuckoo are represented by flute and clarinet melodies, and in the fourth movement a furious thunderstorm is depicted using all the sonic resources of the orchestra. An example of an imitation of a human sound is the “sighing figure” used by many composers, including Mozart in his Fantasia in D minor, K.397.
通过音色绘画,作曲家有目的地尝试将音乐领域之外的事物带入脑海。然而,正如许多理论家所指出的那样,熟练的音色绘画绝不仅仅是对自然声音的简单模仿:音乐成分还必须在乐曲的更大结构框架中有意义。缺乏这种音乐敏感性的音画通常会引起评论家的愤怒。Langer (1942) 引用一位 18 世纪的评论家 (Hüller) 抱怨说“我们的间奏曲……充满了奇妙的模仿和愚蠢的把戏。在那里可以听到钟声敲击、鸭子叽叽喳喳、青蛙嘎嘎,很快就能听到跳蚤打喷嚏和青草生长的声音”(第 220 页)。
With tone painting, composers are purposefully trying to bring something to mind that lies outside the realm of music. However, as noted by many theorists, skillful tone painting is never just a simple imitation of natural sounds: The musical components must also make sense in the larger structural framework of a piece. Tone painting that lacks this musical sensitivity typically arouses the ire of critics. Langer (1942) cites one 18th-century critic (Hüller) as complaining that “Our intermezzi … are full of fantastic imitations and silly tricks. There one can hear clocks striking, ducks jabbering, frogs quacking, and pretty soon one will be able to hear fleas sneezing and grass growing” (p. 220).
根据拉特纳 (Ratner, 1980) 的说法,“从与崇拜、诗歌、戏剧、娱乐、舞蹈、仪式、军事、狩猎和下层阶级生活的接触中,18 世纪早期的音乐发展出了一个特征人物的词库,这为经典作曲家留下了丰富的遗产”(第 9 页)。拉特纳将这些人物指定为“主题”,这个术语意在捕捉他们是音乐话语主题的想法。拉特纳确定了一些主题,包括舞蹈形式、狩猎音乐和田园主题。例如,他认为,小步舞曲的舞蹈形式“象征着优雅世界的社会生活,[而]进行曲则提醒听众权威”(第 16 页)。田园主题是简单的旋律(例如,牧羊人音乐的特征),大概会让人想起纯真和与自然的联系。主题的想法已被证明对音乐理论家是富有成效的(例如,参见 Gjerdingen,2007 年;Agawu,1991 年;Monelle,2000 年;Hatten,2004 年),并且也引起了音乐认知心理学家的关注。Krumhansl (1998) 对听众听到莫扎特的弦乐五重奏(或贝多芬的弦乐四重奏)的主题进行了感知研究,并提供了三个连续尺度之一的实时判断:可记忆性、开放性(音乐必须的感觉)继续)和情感。Krumhansl 将这些回应与 Agawu(1991)对这些作品中主题的详细分析进行了比较,Agawu 在莫扎特的作品中确定了 14 个主题,在贝多芬的作品中确定了 8 个主题(图 6.2)。
According to Ratner (1980), “From its contacts with worship, poetry, drama, entertainment, dance, ceremony, the military, the hunt, and the life of the lower classes, music in the early 18th century developed a thesaurus of characteristic figures, which formed a rich legacy for classic composers” (p. 9). Ratner designated these figures “topics,” a term meant to capture the idea that they were subjects for musical discourse. Ratner identified a number of topics, including dance forms, hunt music, and the pastoral topic. He argued, for example, that the dance form of the minuet “symbolized the social life of the elegant world, [whereas] the march reminded the listener of authority” (p. 16). Pastoral topics were simple melodies (e.g., characteristic of the music of shepherds), and presumably called to mind ideas of innocence and a connection to nature. The idea of topic has proved fruitful for music theorists (see, for example, Gjerdingen, 2007; Agawu, 1991; Monelle, 2000; Hatten, 2004), and has also attracted attention from cognitive psychologists of music. Krumhansl (1998) conducted a perceptual study of topics in which listeners heard a string quintet by Mozart (or a string quartet by Beethoven) and provided real-time judgments on one of three continuous scales: memorability, openness (a sense that the music must continue), and emotion. Krumhansl compared these responses to a detailed analysis of topics in these pieces by Agawu (1991), who identified 14 topics in the Mozart piece and 8 in the Beethoven (Figure 6.2).
Krumhansl 发现知觉反应与音乐主题的时间相关。例如,在莫扎特的作品中,“Pastoral”和“Sturm und Drang”(风暴和压力)主题与开放性和易记性相关,可能是因为这些主题经常出现在主要部分或小节的开头音乐。
Krumhansl found that the perceptual responses were correlated with the timing of the topics in the music. For example, in the Mozart piece, the “Pastoral” and “Sturm und Drang” (storm and stress) topics were associated with openness and memorability, likely due to the fact that these topics frequently occurred at the beginning of major sections or subsections in the music.
值得注意的是,Krumhansl 的受试者不熟悉音乐作品,而且许多人几乎没有受过音乐训练(事实上,研究结果表明音乐专业知识没有影响)。据推测,对于她的听众来说,这些话题并没有让他们想起他们最初的文化联想。换句话说,现代听众可能没有激活知识渊博的 18 世纪听众可能激活的含义。尽管如此,Krumhansl 的研究表明,主题仍然在作品的心理体验中发挥作用。
It is noteworthy that Krumhansl’s subjects were unfamiliar with the musical pieces, and many had little musical training (in fact, the results of the study showed no effect of musical expertise). Presumably, for her listeners the topics did not bring to mind their original cultural associations. In other words, modern listeners likely did not activate the meanings that a knowledgeable 18th-century listener might have activated. Nevertheless, Krumhansl’s study suggests that the topics still play a role in the psychological experience of the piece.
从语言-音乐比较的角度来看,“话题论”的意义何在?Krumhansl 认为音乐和语言的主题结构之间可能存在相似之处。在她研究的两篇文章中,不同的主题在不同的延迟下重复出现,她认为这“似乎与 Chafe (1994) 的观点相一致,即对话中的主题可以在半活跃状态下保持一段时间,准备好稍后重新激活”(第 134 页)。也就是说,话语分析表明,重新激活半主动信息的过程是人类语言话语的基础;Krumhansl 建议,通过主题的使用,器乐可以以巧妙的方式在这个过程中发挥作用。
What is the significance of “topic theory” from the standpoint of language-music comparisons? Krumhansl suggests that there may be a parallel between topical structure in music and language. In both of the pieces she studied, different topics were repeated at various delays, which she argues “seems compatible with Chafe’s (1994) notion that topics within conversations can be maintained for a time in a semiactive state, ready to be reactivated later” (p. 134). That is, discourse analysis suggests that a process of reactivating semiactive information is basic to human linguistic discourse; Krumhansl suggests that via the use of topics, instrumental music may play on this process in artful ways. This idea may help explain the intuition of certain musically sensitive people that the sound of a string quintet or quartet is somehow reminiscent of a well-wrought “conversation” between several individuals.
图 6.2莫扎特弦乐五重奏主题分析。来自 Krumhansl,1998。Th1 = 主题 1,Th2 = 主题 2,Dev = 发展,Recap = 再现。
Figure 6.2 An analysis of topics in a Mozart string quintet. From Krumhansl, 1998. Th1 = Theme 1, Th2 = Theme 2, Dev = Development, Recap = Recapitulation.
另一种形式的音乐主题,即主旋律,也与语言-音乐比较有关。主旋律是与特定人物、情境或想法相关联的音乐人物,允许作曲家使用音乐将非音乐的东西带到脑海中。由于主旋律主要用于歌剧或电影(其中可以通过将音乐与人物和场景并置来形成关联),因此这里不再进一步讨论,因为我们的重点是纯器乐。它们将在认知和神经部分进一步讨论。
Another form of musical topic, the leitmotif, is also relevant to language-music comparisons. A leitmotif is a musical figure used in association with a particular character, situation, or idea, permitting a composer to use music to bring something nonmusical to mind. Because leitmotifs are primarily used in opera or film (in which an association can be formed between by juxtaposing music with characters and situations), they will not be discussed further here, because our focus is on purely instrumental music. They are discussed further in the cognitive and neural section.
器乐并不存在于真空中。不同类型的器乐(例如,“古典”音乐与蓝草音乐)与不同的文化、背景和人群相关联。音乐可以将这些社会联想带入脑海,实证研究表明,音乐意义的这一方面可以影响行为。这项研究包括对消费者行为方式的研究在各种背景音乐的存在下。在一项研究中,Areni 和 Kim (1993) 在一家葡萄酒商店中播放古典音乐与前 40 名精选曲目,发现播放古典音乐时顾客会购买更昂贵的葡萄酒。在另一项研究中,North 等人。(1999) 在超市的葡萄酒区隔天播放法国和德国民间音乐。(法国音乐主要是手风琴音乐,德国音乐由 Bierkeller [啤酒厅] 乐队演奏,主要使用铜管乐器。)他们将法国和德国的葡萄酒安排在相邻的架子上,并用国旗标记以帮助线索购物者对葡萄酒的产地。他们发现,在“法国音乐日”,法国葡萄酒的销量超过德国葡萄酒,反之亦然。
Instrumental music does not exist in a vacuum. Different types of instrumental music (e.g., “classical” music vs. bluegrass) are associated with different cultures, contexts, and classes of people. Music can bring these social associations to mind, and empirical research has shown that this aspect of musical meaning can influence behavior. This research includes studies of how consumers behave in the presence of different kinds of background music. In one study, Areni and Kim (1993) played classical music versus Top 40 selections in a wine shop, and found that customers bought more expensive wine when classical music was played. In another study, North et al. (1999) played French versus German folk music on alternate days in the wine section of a supermarket. (The French music was mainly accordion music, and the German music was played by a Bierkeller [beer-hall] band, primarily on brass instruments.) They arranged the French and German wines on adjacent shelves, marked by national flags to help clue the shoppers to the origin of the wines. They found that on “French music days,” the French wines outsold the German wines, and vice versa.
器乐中的另一种社会联系形式是将音乐与种族或群体身份联系起来。例如,Stokes (1994, cited in Gregory, 1997)“将北爱尔兰‘爱尔兰人’和‘英国人’身份之间的界限描述为由音乐家巡逻和执行。带着横笛和鼓乐队的新教奥兰治人游行将贝尔法斯特的中心城市空间定义为阿尔斯特新教徒和英国统治的领域。另一方面,通常被认为是‘天主教’音乐的‘爱尔兰传统’音乐在酒吧和俱乐部广泛播放,并定义了城市的天主教区”(第 131 页)。
Another form of social association in instrumental music is the linking of music to ethnic or group identity. For example, Stokes (1994, cited in Gregory, 1997) “describes the boundary between ‘Irish’ and ‘British’ identities in Northern Ireland as being patrolled and enforced by musicians. The parades of Protestant Orangemen with fife and drum bands define the central city space in Belfast as the domain of the Ulster Protestants and of British rule. On the other hand ‘Irish traditional’ music, which is often considered ‘Catholic’ music, is widely played in bars and clubs, and defines the Catholic areas of the city” (p. 131).
在语言中,语音还通过对说话者口音的感知与不同的文化、背景和人群相关联。(请注意,口音不同于方言:前者是指特定的发音方式,而后者是指以发音、词汇和句子结构区分的语言变体。)社会语言学家已经证明说话者的口音可以影响听众的信念关于说话者的教育背景、种族认同等(Honey,1989;cf. Labov,1966;Docherty & Foulkes,1999)。因此,将声音与社会联系联系起来的趋势似乎为语言和音乐所共有。
In language, speech sounds are also associated with different cultures, contexts and classes of people via the perception of a speaker’s accent. (Note that accent is distinct from dialect: The former refers to a particular way of pronunciation, whereas the latter refers to a language variant distinguished by pronunciation, vocabulary, and sentence structure.) Sociolinguists have demonstrated that a speaker’s accent can influence a listener’s beliefs about a speaker’s educational background, ethnic identity, and so forth (Honey, 1989; cf. Labov, 1966; Docherty & Foulkes, 1999). Thus a tendency to link sound with social associations appears to be shared by language and music.
非正式证明,聆听器乐可以唤起非音乐现象的心理意象,例如自然场景或“对人物、地点和经历的半遗忘的想法”(Meyer,1956:256)。还有非正式的证据表明,聆听可以产生叙事思维的形式,例如,一种特定的音乐片段反映了“英勇斗争胜利解决的感觉”(Sloboda,1985:59)。事实上,关于器乐的解释性写作的一个早期传统是基于器乐是一种“无言的戏剧”的想法(Kivy,2002)。这些建议最近得到了实证研究的支持,在这些研究中,听众被要求聆听短管弦乐曲(例如,Elgar 的《谜语变奏曲》中的“主题”)并画出能够直观地描述他们所听到内容的东西(Tan & Kelly,2004)。参与者还写了一篇短文来解释他们所做的绘画。音乐家倾向于创造抽象(非图片)表现形式侧重于重复和主题结构等结构方面,而非音乐家更倾向于绘制图像或故事,通常具有显着的情感内容。
It is informally attested that listening to instrumental music can evoke mental imagery of nonmusical phenomena, such as scenes from nature or “half forgotten thoughts of persons, places, and experiences” (Meyer, 1956:256). There is also informal evidence that listening can give rise to forms of narrative thought, for example, a sense that a particular piece of music reflected “the feeling of a heroic struggle triumphantly resolved” (Sloboda, 1985:59). Indeed, one early tradition of interpretive writing about instrumental music was based on the idea that instrumental music was a kind of “wordless drama” (Kivy, 2002). These suggestions have recently been supported by empirical research in which listeners were asked to listen to short orchestral pieces (e.g., Elgar’s “Theme” from the Enigma Variations) and to draw something that visually described what they were hearing (Tan & Kelly, 2004). Participants also wrote a short essay to explain the drawing they made. Musicians tended to create abstract (nonpictorial) representations that focused on structural aspects such as repetition and theme structure, whereas nonmusicians were much more likely to draw images or stories, often with salient affective content.
尽管在意象方面音乐和语言之间没有明显的联系,但音乐感知中的“叙事倾向”似乎与我们不断贩卖连贯的语言叙事有关,声音结构被用来推断一系列逻辑上的序列,这似乎是合理的世界上相互关联的事件,因果联系在一起。
Although there is no obvious connection between music and language in terms of imagery, it does seem plausible that a “narrative tendency” in music perception is related to our constant trafficking in coherent linguistic narratives, whereby sound structure is used to infer a sequence of logically connected events in the world, linked by cause and effect.
如果在一个人生命中的重要事件中听到了一段特定的音乐或风格的音乐,那么该音乐可以通过它唤起的记忆而具有特殊的个人意义。通常,这些类型的联想围绕歌曲(具有口头叙事元素)展开,但也有文献记载的案例表明,器乐是“时光倒流”的有力工具,换句话说,可以生动地回忆起什么一个人在生活的某个特定时刻思考和感受 (Denora, 2001; Sloboda et al., 2001)。
If a particular piece or style of music is heard during an important episode in one’s life, that music can take on a special, personal meaning via the memories it evokes. Typically, these sorts of associations revolve around songs (which have a verbal narrative element), but there are also documented cases of instrumental music being a powerful vehicle for “taking one back in time,” in other words, leading to vivid recall of what one was thinking and feeling at a particular point in life (Denora, 2001; Sloboda et al., 2001).
社会学家和民族音乐学家帮助人们注意到使用音乐来构建(相对于表达)自我认同(Becker,2001;Denora,2001;Seeger,1987)。在这种观点下,听音乐不仅仅是体验一个人的日常情绪或一个人已有的世界观,它是一种改变心理空间的方式,是一种“暂时成为不同于常人的另一种人的机会,日常自我”(贝克尔,2001:142)。在西方文化中,音乐的这些方面似乎对青少年尤为重要,对他们来说,音乐可以成为身份形成过程的一部分。例如,North 等人。(2000) 发现英国男孩听音乐的主要理由不是调节情绪而是给别人留下印象,尽管大多数受访者表示他们自己听音乐。因此,音乐似乎在身份的建构中发挥作用。当然,歌曲可能构成了这些男孩听的大部分音乐,这意味着身份形成过程实际上是语言和音乐的产物。人们想知道器乐是否可以像有文字的音乐一样成为身份形成的强大力量。
Sociologists and ethnomusicologists have helped draw attention to the use of music to construct (vs. express) self-identity (Becker, 2001; Denora, 2001, Seeger, 1987). In this view, listening to music is not simply a matter of experiencing one’s everyday emotions or the worldview one already has, it is a way of changing one’s psychological space, and an “opportunity to be temporarily be another kind of person than one’s ordinary, everyday self” (Becker, 2001:142). In Western culture, these aspects of music seem to be particularly important to adolescents, for whom music can be part of the processes of identity formation. For example, North et al. (2000) found that British boys’ primary stated reason for listening to music was not to regulate mood but to create an impression on others, even though the majority of respondents said that they listened to music on their own. Thus the music seemed to be playing a role in construction of identity. Of course, songs probably formed the bulk of the music listened to by these boys, meaning that the identity-forming process is really a product of both language and music. One wonders if instrumental music can be as strong a force in identity formation as can music with words.
尽管身份的构建是一个需要多年的过程,但在某些情况下,音乐可以通过一种被称为恍惚的意识状态的改变,在快速转变自我意识方面发挥关键作用(贝克尔,2001 年,2004 年)。催眠发生在许多文化中,例如,伊朗的苏菲神秘主义者、巴厘岛的仪式舞者和美国的五旬节派基督徒。虽然一些怀疑论者认为催眠是一种伪装,催眠行为的跨文化相似性表明催眠者确实处于改变的神经生理状态。这些共性包括难以记住在恍惚期间发生的事情和在恍惚期间痛阈大大增加(即健忘症和镇痛)。此外,对不同文化中的催眠师进行广泛而仔细的实地调查,使得所讨论的现象很可能在心理上是真实的(Becker,2004;cf. Penman & Becker,已提交),巴厘岛最近的初步脑电图(EEG)支持了这一观点恍惚仪式中的舞者 (Ooohashi et al., 2002)。在这项开创性的研究中,
Although the construction of identity is a process that unfolds over years, in certain circumstances, music can play a key role in rapidly transforming the sense of self via an altered states of consciousness known as trance (Becker, 2001, 2004). Trance occurs in many cultures, for example, among Sufi mystics in Iran, ritual dancers in Bali, and Pentecostal Christians in America. Although some skeptics regard trance as a form of fakery, cross-cultural similarities in trance behavior suggest that trancers really are in an altered neurophysiological state. Such commonalities include difficulty remembering what happened during the trance and a greatly increased pain threshold during the trance (i.e., amnesia and analgesia). Furthermore, extensive and careful fieldwork with trancers in different cultures makes it seem very likely that the phenomenon in question is psychologically real (Becker, 2004; cf. Penman & Becker, submitted), a view supported by recent preliminary electroencephalographic (EEG) from Balinese dancers during a trance ceremony (Oohashi et al., 2002). In this pioneering study, the researchers used an EEG telemetry system and compared the brain waves of a dancer who went into trance versus two other dancers who did not, and found a markedly different pattern of brain waves in the former individual (increased power in the theta and alpha bands).
尽管音乐在恍惚状态中起着重要作用,但它与恍惚状态的关系很复杂,没有证据表明音乐可以在没有正确背景和听众默示同意的情况下使听众进入恍惚状态。音乐通常是包含文字(例如,苏菲或五旬节歌曲或巴厘岛戏剧的文字)的更大仪式活动的一部分,这表明快速的心理转变涉及语言和音乐的使用。再一次,人们想知道单独的器乐是否可以产生类似的效果。
Although music plays an important role in trance, its relationship to trance is complex, and there is no evidence that music can put a listener in a trance without the right context and the listener’s implicit consent. The music is typically part of a larger ritual event that includes words (e.g., the words of Sufi or Pentecostal songs or of Balinese drama), suggesting that rapid psychological transformation involves the use of both language and music. Once again, one wonders if instrumental music alone could have a comparable effect.
本分类法中讨论的音乐意义的最终形式是最抽象的,从音乐的心理体验的角度来看也是最具推测性的。这就是音乐结构与非音乐文化概念之间的关系,民族音乐学家将这种联系作为社会背景下音乐意义的一部分。Becker 和 Becker(1981 年;另见 Becker,1979 年)对爪哇甘美兰音乐的分析提供了此类意义的一个例子。这些作者指出,加麦兰音乐是由相互嵌入的循环旋律模式(由不同乐器演奏)构成的。这些模式以不同的速度进行,因此当不同部分在特定时间点聚集在一起时,会产生可预测巧合的循环模式。他们认为这些模式在概念上类似于爪哇的日历周期。在 Java 中,一天是由它在以不同速率移动的五个不同日历周期中的位置来描述的。出现的循环巧合在文化上很重要,因为它们使某些日子吉祥而其他日子危险。贝克尔和贝克尔写道:
The final form of musical meaning discussed in this taxonomy is the most abstract, and the most speculative from the standpoint of the psychological experience of music. This is the relationship between musical structure and ex-tramusical cultural concepts, a link that has been posited by ethnomusicologists as part of the meaning of music in a social context. An example of this type of meaning is provided by the analysis of Javanese gamelan music by Becker and Becker (1981; see also Becker, 1979). These authors note that gamelan music is structured by cyclic melodic patterns (played by different instruments) embedded within each other. The patterns proceed at different rates and thus create a cyclic patterns of predictable coincidences as the different parts come together at certain points in time. They argue that these patterns are conceptually similar to Javanese calendrical cycles. In Java a day is described by its position within five different calendric cycles moving at different rates. The cyclic coincidences that emerge are culturally important, because they make certain days auspicious and other days hazardous. Becker and Becker write:
鸣锣 [即加麦兰音乐] 与日历(以及文化中的其他系统)的象似性是它们与超越自身的进口产生共鸣的装置之一。巧合或同时发生是中心来源传统爪哇文化中的意义和力量。日历和音乐中的巧合都代表了需要分析和仔细研究的事件,因为两者背后的概念很重要。Kebetulan,印度尼西亚语中的“巧合”和爪哇语中的kebeneran都源自意为“真相”的词根,betel/bener。正如加麦兰音乐中重要结构点的音高重合一样,某些日子恰好标志着个人生活中的重要时刻。有人可能会说加麦兰音乐是一种听觉和触觉的想法(人们用自己的皮肤和耳朵听到)。(第 210 页)
The iconicity of sounding gongs [i.e., gamelan music] with calendars (and other systems within the culture) is one of the devices whereby they resonate with import beyond themselves. Coincidence, or simultaneous occurrence, is a central source of both meaning and power in traditional Javanese culture. Coincidings in calendars and in music both represent events to be analyzed and scrutinized because of the importance of the concept which lies behind both. Kebetulan, “coincidence” in Indonesian, and kebeneran in Javanese both derive from root words meaning “truth,” betel/bener. As pitches coincide at important structural points in gamelan music, so certain days coincide to mark important moments in one’s personal life. One might say that gamelan music is an idea made audible and tactile (one hears with one’s skin as well as one’s ears). (p. 210)
读者可能会感兴趣,贝克尔和贝克尔在皮影戏的结构中找到了文化与艺术之间这种联系的进一步证据,皮影戏也是围绕循环和巧合(在不同的情节线之间)而不是围绕统一的因果关系建立的导致高潮的序列。
It might interest the reader to know that Becker and Becker find further evidence for this link between culture and art in the structure of shadow-puppet plays, which are also built around cycles and coincidences (between different plot lines) rather than around a unified causal sequence leading to a climax.
虽然这只是一个例子,但这种分析在民族音乐学文献中很普遍。通过询问其文化的哪些方面反映在其结构中,也可以在这个框架中分析西欧器乐。例如,有人可能会争辩说,西方音乐和弦句法中如此重视的和声进行感反映了西方文化对进步和转变的痴迷。这样的观察可能有道理,但从心理学的角度来看,相关的问题是音乐和文化之间的隐喻联系是否在个人听众所体验的音乐意义中发挥任何作用。据推测,加麦兰的声音不会让爪哇听众想起日历,和弦进行不会在西方听众中引起工业革命的想法。然而,简单地使一个人的文化的音乐听起来“自然”,换句话说,以某种方式符合一个人熟悉的一般概念框架,可能会产生间接的心理影响。
Although this is but one example, analyses of this sort are widespread in the ethnomusicological literature. Western European instrumental music can be analyzed in this framework as well, by asking what aspects of its culture are reflected in its structure. One could argue, for example, that the sense of harmonic progression that is so valued in Western musical chord syntax reflects a Western cultural obsession with progress and transformation. There may be truth in such observations, but from a psychological perspective, the relevant question is whether metaphorical connections between music and culture play any role in musical meaning as experienced by individual listeners. Presumably, the sound of a gamelan does not bring calendars to mind for Javanese listeners, and chord progressions do not provoke thoughts of the industrial revolution in Western listeners. Nevertheless, there may be an indirect psychological influence in simply making the music of one’s culture sound “natural,” in other words, somehow fitting with the general conceptual framework with which one is familiar.
这种关系有语言上的类比吗?显然,语言通过其词汇反映了文化概念(尽管请参阅 Pullum 1991 年的经典文章“The Great Eskimo Vocabulary Hoax”作为警告),但语言结构(例如句法或话语结构)似乎不太可能反映文化概念的各个方面文化。11这是明智的,因为音乐结构是个人有意识选择的结果,而语言结构则产生于认知因素与语言变化的分布式力量的相互作用。复杂的声音结构和特定文化概念之间的联系可能是音乐领域所独有的。
Does this sort of relationship have any analog in language? Clearly, language reflects cultural concepts via its vocabulary (though see Pullum’s classic 1991 essay, “The Great Eskimo Vocabulary Hoax” for a word of caution), but it seems unlikely that linguistic structures (e.g., syntax, or discourse structure) reflect aspects of culture.11 This is sensible, because musical structure is the result of conscious choices made by individuals, whereas linguistic structure emerges from the interplay of cognitive factors with the distributed forces of language change. The link between complex sonic structures and specific cultural concepts may be unique to the domain of music.
关于音乐与语言意义的关系的讨论通常集中在语义学上,以及器乐在缺乏命题内容时如何有意义的问题。请注意,这个问题假设“意义”等同于语言中的语义意义。如果以这种方式解释“意义”,那么语言和音乐意义的比较必须说明音乐在语义上有多接近。事实上,认知科学和神经科学的研究正试图做到这一点(在下面的6.3.1 节中讨论)。然而,我相信还有另一种可能更有效的方法来比较语言和音乐意义。回忆第 6.1.2 节语言意义可以大致分为两个不同的领域:语义学和语用学。语用学指的是研究听众如何将上下文信息添加到语义结构中,以及他们如何对所说的内容进行推断。音乐意义与语言语用学之间的关系几乎未被探索。然而,正如下面第 6.3.2 节中所讨论的,这种关系可能是未来调查的一个有前途的领域。
Discussions of music’s relation to linguistic meaning often focus on semantics, and on the question of how instrumental music can be meaningful when it lacks propositional content. Note that this question assumes that “meaning” is equivalent to semantic meaning in language. If “meaning” is construed in this way, then comparisons of linguistic and musical meaning must address how close music can come to being semantically meaningful. Indeed, research in cognitive science and neuroscience has attempted to do just that (discussed in section 6.3.1 below). I believe there is another, potentially more fruitful way to compare linguistic and musical meaning, however. Recall from section 6.1.2 that linguistic meaning can be broadly divided into two distinct areas: semantics and pragmatics. Pragmatics refers to the study of how listeners add contextual information to semantic structure and how they draw inferences about what has been said. The relationship of musical meaning to linguistic pragmatics is virtually unexplored. Yet as discussed in section 6.3.2 below, this relationship may be a promising area for future investigation.
回想一下6.1.2 节,音乐意义问题的一种哲学方法是坚持“意义”表示语义指称和谓词的观点,因此否认音乐具有意义,或者“甚至可以是无意义的”(基维,2002 年)。按照这种观点,任何关于音乐“意义”的讨论都犯了哲学范畴错误。这当然是少数人的立场,因为音乐理论家和哲学家继续源源不断地发表关于“音乐意义”的出版物,在比 Kivy 允许的更广泛的意义上处理“意义”(参见第 6.2 节)). 然而,Kivy 的立场很有趣,因为它表达得很清楚,并引发了以下问题:音乐缺乏语义成分是绝对正确的,还是情况更微妙、更复杂?音乐有时会涉及语义处理,使用与语言相关的认知和神经操作重叠吗?在本节中,我将重点关注语义指称(而不是谓词),因为在这个问题上存在比较经验数据。
Recall from section 6.1.2 that one philosophical approach to the question of musical meaning is to hold fast to the idea that “meaning” indicates semantic reference and predication, and to therefore deny that music has meaning, or “can even be meaningless” (Kivy, 2002). In this view, any discussion of “meaning” in music commits a philosophical category error. This is certainly a minority position, because music theorists and philosophers continue to produce a steady stream of publications on “musical meaning,” treating “meaning” in a broader sense than Kivy allows (cf. section 6.2). Nevertheless, Kivy’s position is interesting because it is clearly articulated and provokes the following question: Is it categorically true that music lacks a semantic component, or is the situation more subtle and complex? Might music at times engage semantic processing, using cognitive and neural operations that overlap with those involved in language? In this section, I will focus on semantic reference (rather than predication), because comparative empirical data exist on this issue.
让我们从以下思想实验开始我们的讨论。选择当今最有才华的作曲家,并给他们每人一张包含 50 个常用名词或动词(例如,“学校”、“眼睛”、“知道”)的列表。要求他们为每个单词写一段器乐,以最清晰的方式传达单词的含义。然后,告诉一组听众他们将听到一系列音乐段落,每个段落都旨在传达一个常见的名词或动词。在每段之后,让听众写下他们认为的单词是什么(或从所用的 50 个单词列表中选择)。不用说,听众极不可能得出作曲家所表达的意思故意的。这个思想实验表明,音乐缺乏对语言至关重要的那种任意的、特定的语义参考。
Let us begin our discussion with the following thought experiment. Choose the most talented composers living today, and give each of them a list of 50 common nouns or verbs (e.g., “school,” “eye,” “know”). Ask them to write a passage of instrumental music for each word that conveys the sense of the word in the clearest possible way. Then, tell a group of listeners that they are going to hear a series of musical passages, each of which is meant to convey a common noun or verb. After each passage, have the listeners write down what they think the word is (or choose it from a list of the 50 words used). Needless to say, it is extremely unlikely that the listeners will arrive at the words that the composers intended. This thought experiment demonstrates that music lacks the kind of arbitrary, specific semantic reference that is fundamental to language.
然而,缺乏语义指称的特殊性并不等同于完全没有指称能力。让我提供以下概念上的区别:器乐缺乏特定的语义内容,但它有时可以暗示语义概念。此外,就文化中听众头脑中激活的概念而言,它可以以一定的一致性来做到这一点。这种观点的证据是经验性的,将在以下两节中进行回顾。
However, lacking specificity of semantic reference is not the same as being utterly devoid of referential power. Let me offer the following conceptual distinction: Instrumental music lacks specific semantic content, but it can at times suggest semantic concepts. Furthermore, it can do this with some consistency in terms of the concepts activated in the minds of listeners within a culture. The evidence for this view is empirical, and is reviewed in the following two sections.
在他的歌剧中,理查德·瓦格纳 (Richard Wagner) 构建了紧凑的音乐单元,旨在暗示音乐之外的意义,例如特定角色、情境或想法。这些主旋律被编织到音乐中以产生戏剧效果,有时是对场景的补充,有时是为了让人物等不在当前场景中时让人想起。尽管人们最常参考瓦格纳的音乐来讨论主题,但重要的是要注意它们不仅限于他的音乐。电影音乐经常采用主旋律技巧。许多看过 1970 年代轰动一时的电影《大白鲨》的人,例如,记住表明大白鲨存在的险恶主题(参见提交的 Cross,关于与这个著名主题相关的音乐意义的有趣讨论)。
In his operas, Richard Wagner constructed compact musical units designed to suggest extramusical meanings such as a particular character, situation, or idea. These leitmotifs were woven into the music for dramatic effect, sometimes complementing the scene and at other times serving to bring characters, and so forth, to mind when they were not part of the current scene. Although leitmotifs are most commonly discussed with reference to Wagner’s music, it is important to note that they are not limited to his music. Film music often employs the leitmotif technique. Many people who saw the 1970s film sensation, Jaws, for example, remember the menacing theme that indicated the presence of the great white shark (cf. Cross, submitted, for an interesting discussion of musical meaning in relation to this famous leitmotif).
Leitmotifs 提供了一个研究音乐语义属性的机会,因为它们被设计成具有参考价值。具体来说,可以将这些音乐单元呈现给以前不熟悉它们的听众,并要求听众指出这些单元所暗示的语义关联。Hacohen 和 Wagner (1997) 进行了一项这样的研究,重点关注方框 6.2中列出的瓦格纳环循环中的九个主旋律。选用的主旋律时长7-13秒,纯器乐性,避免使用拟声词如“风暴”动机。
Leitmotifs provide an opportunity to study the semantic properties of music because they are designed to have referential qualities. Specifically, one can present these musical units to listeners previously unfamiliar with them, and ask the listeners to indicate the semantic associations that the units suggest. One such study has been conducted by Hacohen and Wagner (1997), focusing on nine leitmotifs from Wagner’s Ring cycle, listed in Box 6.2. The selected leitmotifs were 7-13 seconds long and purely instrumental, and onomatopoeic leitmotifs such as the “storm” motive were avoided.
Hacohen 和 Wagner 的研究有一个有趣的特点,那就是它发生在以色列,由于与纳粹政权的联系,以色列历史上禁止在音乐厅或广播中播放 Wagner 的音乐。因此,研究人员可以独特地接触到熟悉西欧调性音乐但很少或根本没有接触过瓦格纳的学生群体。
One interesting feature of Hacohen and Wagner’s study was that it took place in Israel, where there was a historical ban on playing Wagner’s music in concert halls or on the radio due to its association with the Nazi regime. Thus the researchers had unique access to a population of students familiar with Western European tonal music but with little or no exposure to Wagner.
专栏 6.2 Hacohen 和 Wagner 研究的 Wagnerian Leitmotifs (1997)
Box 6.2 Wagnerian Leitmotifs Studied by Hacohen and Wagner (1997)
专栏 6.3 Hacohen 和 Wagner (1997) 使用的语义尺度
Box 6.3 Semantic Scales Used by Hacohen and Wagner (1997)
在他们研究的第一部分,向 174 名听众展示了主题,并要求他们在七个语义尺度上对它们进行评分,如框 6.3所示。没有一个音阶的名称与主旋律的主题相似。此外,每个主题都根据它让听众感到“喜欢”或“不喜欢”的程度进行评分。
In the first part of their study, 174 listeners were presented with the motifs and asked to rate them on seven semantic scales, shown in Box 6.3. None of the scales had names that were similar to the topics of the leitmotifs. In addition, each leitmotif was rated for how much it made the listener feel “liking” or “disliking.”
对结果的统计分析显示,大多数主题都属于三类之一,即“友好”类、“暴力”类和“沉闷”类,如图 6.3所示。从图中可以明显看出,就主旋律的语义特征而言,聚类内有很多一致性,每个聚类甚至具有一对“同义词”,换句话说,两个具有高度相似语义特征的主旋律(爱-睡眠、诅咒狩猎和死亡挫折)。有趣的是,语义尺度中表示的大多数概念是情感的(例如,快乐-悲伤)、伪情感的(例如,力量-弱点,这可以看作类似于二维情感空间的“活动”维度效价和活性,参见第 6.2.3 节),或类似“个性”(例如,冲动克制),表明听众根据“角色”来感知主题(参见 Watt 和 Ash,1998)。
Statistical analyses of the results revealed that most leitmotifs fell into one of three clusters, a “friendly” cluster, a “violent” cluster, and a “dreary” cluster, shown in Figure 6.3. As is evident from the figure, there was much consistency within clusters in terms of the semantic profiles of the leitmotifs, and each cluster even featured a pair of “synonyms,” in other words, two leitmotifs with a highly similar semantic profile (love-sleep, curse-Hunding, and death-frustration). It is interesting that most of the concepts represented in the semantic scales are affective (e.g., joy-sadness), pseudoaffective (e.g., strength-weakness, which can be seen as analogous to the “activity” dimension of a two-dimensional emotion space of valence and activity, cf. section 6.2.3), or “personality”-like (e.g., impetuosity-restraint), suggesting that listeners perceive leitmotifs in terms of a “persona” (cf. Watt and Ash, 1998).
尽管这些结果很有趣,但它们对音乐唤起语义概念的能力的测试有些薄弱,因为它们以语义等级的形式提供概念,并仅要求听众对其进行评分。一个更强的测试是允许听众不受任何限制地选择概念。幸运的是,Hacohen 和 Wagner 进行了与此问题相关的第二个实验。他们要求 102 名参与者聆听每个主旋律,想象这是一部电影的音乐主题,然后给这部电影命名。没有给出限制或预先确定的类别。结果的示例如图 6.4所示,它给出了为“火”主题生成的 75 个标题中的 16 个(请注意,该主题不适合语义尺度方法中的三个集群中的任何一个)。
Although these results are interesting, they are a somewhat weak test of music’s power to evoke semantic concepts, because they provide concepts in the form of semantic scales and simply ask listeners to rate them. A stronger test would be to allow listeners to choose concepts without any constraints. Fortunately, Hacohen and Wagner conducted a second experiment relevant to this issue. They asked 102 of their participants to listen to each leitmotif, imagine that it was the musical theme of a film, and then to give the film a name. No constraints or preestablished categories were given. An example of the result is shown in Figure 6.4, which gives 16 of the 75 titles produced for the “fire” motif (note that this motif did not fit into any of the three clusters in the semantic scale method).
图 6.4以“模块化”方式排列,动作或事件在一列中,动作的代理在另一列中,等等。该图显示了所提供标题的主旨有相当程度的重叠,以及这种共性中的一些差异。使用这种“电影命名”技术,Hacohen 和 Wagner 发现每个主题都具有惊人的一致性,尽管只有“爱”和“死亡”动机的标题与原始动机相匹配。
Figure 6.4 is arranged in a “modular” fashion, with actions or events in one column, agents of the action in another column, and so forth. The figure shows a fair degree of overlap in the gist of the titles provided, as well as some diversity within this commonality. Using this “film-naming” technique, Hacohen and Wagner found a remarkable amount of consistency for each leitmotif, though only the “love” and “death” motives were given titles that matched the original ones.
图 6.3瓦格纳主旋律的语义概况,来自 Hacohen & Wagner,1997。x 轴上的数字表示给定主旋律表达两个相反语义概念之一的程度,其中 1 表示对主旋律左侧概念的强烈表达量表(例如,悲伤),7 表示强烈表达量表右侧的概念(例如,快乐),4 表示中性。
Figure 6.3 Semantic profiles of Wagnerian Leitmotifs, from Hacohen & Wagner, 1997. The numbers along the x-axes represent how much a given leitmotif expresses one of two opposite semantic concepts, in which 1 represents strong expression of the concept on the left of the scale (e.g., sadness), 7 represents strong expression of the concept on the right of the scale (e.g., joy), and 4 is neutral.
他们还发现了另外两件有趣的事情。首先,电影命名技术区分了在语义尺度技术中作为同义词出现的主题。例如,对于爱-睡这对,他们发现与“爱”相关的标题强调人际关系方面,而与“睡眠”相关的标题则强调风景(自然的宁静图像)和气氛(神秘)。这表明音乐语义学的研究不应该仅仅依赖于使用预先建立的语义尺度的评级,还应该研究自由联想的结果。第二个有趣的发现是,某些主题从未出现在标题中,例如宗教、历史、城市或国内概念。Hacohen 和 Wagner 认为,这与神的神话、异教和非历史特征是一致的。戒指。
They also found two other interesting things. First the film-naming technique differentiated between leitmotifs that emerged as synonymous in the semantic scale technique. For example, for the love-sleep pair, they found that titles associated with “love” emphasized the interpersonal aspect, whereas titles associated with “sleep” emphasized scenery (tranquil images of nature) and atmosphere (mystery). This demonstrates that studies of musical semantics should not rely solely on ratings using preestablished semantic scales, but should also study the results of free association. The second interesting finding was that certain topics never appeared in the titles, such as religious, historical, urban, or domestic concepts. Hacohen and Wagner argue that this is consistent with the mythological, pagan, and ahistorical character of the Ring.
Koelsch 及其同事使用事件相关电位 (ERPs; Koelsch et al., 2004) 对音乐中的语义参考进行了一项创新研究。他们试图解决的问题是,陌生的纯器乐(取自西欧古典音乐)是否可以向听众暗示语义概念。科尔施等人。使用称为 N400 的语言语义处理的神经签名测试了这个想法。N400 是响应单词而产生的 ERP,无论是单独出现还是在句子上下文中出现。(名称“N400”反映了一个事实,即它是一个负向电位,在单词开始后大约 400 毫秒具有最大振幅。)N400 的振幅由单词与其上下文之间的语义匹配来调制。例如,N400 到句子“the pizza was too hot to cry”中的最后一个词明显大于 N400 到句子“the pizza was too hot to eat”中的最后一个词(Kutas & Hillyard,1984)。尽管这种“N400 效应”已经使用诸如这个这样的坦率语义异常进行了最广泛的研究,但重要的是要注意 N400 效应不需要异常。正如所讨论的第 5 章,当比较句子时可以观察到 N400 效应,其中单词只是从语义的角度或多或少地被预测。例如,当比较句子“课后女孩把糖果放进嘴里”和“课后女孩把糖果放在口袋里”时,“口袋”与“嘴巴”的 N400 效应(Hagoort等人,1999 年)。这反映了这样一个事实,即考虑到该词的上下文,“口袋”在语义上比“嘴巴”更不可能。因此,N400 是语言中语义整合的敏感度量,并且被认为反映了将传入单词与先前上下文生成的语义表示相关联的过程。12
An innovative study of semantic reference in music has been conducted by Koelsch and colleagues, using event-related potentials (ERPs; Koelsch et al., 2004). The question they sought to address was whether unfamiliar, purely instrumental music (taken from Western European classical music) could suggest semantic concepts to a listener. Koelsch et al. tested this idea using a neural signature of linguistic semantic processing known as the N400. The N400 is an ERP produced in response to a word, whether presented in isolation or in a sentence context. (The name “N400” reflects the fact that it is a negative-going potential with an amplitude maximum approximately 400 ms after the onset of a word.) The amplitude of the N400 is modulated by the semantic fit between a word and its context. For example, the N400 to the final word in the sentence “the pizza was too hot to cry” is significantly larger than the N400 to the final word in the sentence “the pizza was too hot to eat” (Kutas & Hillyard, 1984). Although this “N400 effect” has been most extensively studied using frank semantic anomalies such as this one, it is important to note that the N400 effect does not require an anomaly. As discussed in Chapter 5, the N400 effect can be observed when comparing sentences in which words are simply more or less predicted from a semantic standpoint. For example, when the sentences “The girl put the sweet in her mouth after the lesson” and “The girl put the sweet in her pocket after the lesson” are compared, an N400 effect occurs for “pocket” versus “mouth” (Hagoort et al., 1999). This reflects the fact that “pocket” is a less semantically probable word than “mouth” given the context up to that word. The N400 is thus a sensitive measure of semantic integration in language, and is thought to reflect processes that relate incoming words to the semantic representation generated by the preceding context.12
火焰图案的简短地图
A Short Map of the Fire Motif
图 6.4 Hacohen & Wagner 1997 研究中与“火”主题相关的一些标题。
Figure 6.4 Some titles associated with the “fire” motif in the study of Hacohen & Wagner, 1997.
科尔施等人。利用 N400 效应来测试音乐是否可以暗示语义概念。听众听到简短的音乐片段13取自古典音乐曲目,每个曲目后面都有一个视觉呈现的名词(可以是具体的,例如“珍珠”,也可以是抽象的,例如“幻觉”)。音乐片段和目标词的特定配对是根据一项单独的行为研究选择的,在该研究中,听众对音乐片段和词的语义相关性进行评分。相关性判断的基础是多种多样的。在某些情况下,基础似乎是隐喻性的。例如,被判断为与“宽度”一词语义匹配良好的音乐片段具有较大的音高间隔和辅音和声,而被判断为与“宽度”一词语义不匹配的音乐片段具有较窄的音高间隔和不和谐的和声。在其他情况下,匹配/不匹配是基于音乐和单词之间的标志性相似性。例如,“叹气”这个词与一段旋律移动方式让人联想到叹息的片段相匹配,而它与一段暗示打斗的愤怒声调的片段不匹配。实现匹配/不匹配的另一种方式是通过人们可能与电影音乐相关联的类型的常规关联。例如,“大篷车”与听起来充满异国情调的中东风格的选段相匹配,而与暗示快速移动和欧洲田园环境的选段不匹配。
Koelsch et al. exploited the N400 effect to test whether music can suggest semantic concepts. Listeners heard short musical excerpts13 taken from the classical music repertoire, each of which was followed by a visually presented noun (which could be concrete, e.g., “pearls” or abstract, e.g., “illusion”). The particular pairing of musical excerpts and target words was chosen based on a separate behavioral study in which listeners rated the semantic relatedness of musical excerpts and words. The bases for the relatedness judgments were rather diverse. In some cases, the basis seemed metaphorical. For example, a musical excerpt judged as a good semantic fit for the word “wideness” had large pitch intervals and consonant harmonies, whereas a musical excerpt judged as a bad semantic fit for “wideness” had narrow pitch intervals and dissonant harmonies. In other cases, the match/mismatch was based on iconic resemblance between the music and the word. For example, the word “sigh” was matched with an excerpt in which the melody moved in a way that suggested the intonation of a sigh, whereas it was mismatched with an excerpt that suggested the angry intonation of a fight. Yet another way in which matches/mismatches were achieved was by conventional association of the type one might associate with film music. For example, “caravan” was matched with an exotic, Middle Eastern sounding excerpt, and mismatched with an excerpt that suggested rapid movement and a European pastoral setting.
科尔施等人。发现语义不匹配与匹配音乐摘录之后的单词的 N400 效果。这种效果在数量上类似于 N400 效果,当这些相同的词跟随语义不匹配与匹配的语言句子时,甚至在听者不需要对摘录和目标的相关性做出任何明确判断时也会发生。此外,源定位技术表明,N400s 到遵循语言和音乐上下文的单词来自大脑的相似区域:双侧颞中回后部(图 6.5)。作者认为,这表明音乐能够以某种特定方式激活语义概念。
Koelsch et al. found an N400 effect for words following semantically mismatched versus matched musical excerpts. This effect was quantitatively similar to the N400 effect seen when these same words followed semantically mismatched versus matched linguistic sentences, and occurred even when listeners were not required to make any explicit judgment about the relatedness of the excerpt and the target. Furthermore, source localization techniques suggested that the N400s to words following linguistic versus musical contexts came from similar regions of the brain: the posterior portion of the middle temporal gyrus, bilaterally (Figure 6.5). The authors argue that this indicates the ability of music to activate semantic concepts with some specificity.
Koelsch 等人的发现提出了有趣的问题。首先,很明显,在这项研究中,音乐片段和目标词之间的关联类型是相当不同的。例如,中东音乐与大篷车图像之间的传统关联远没有音高间隔/谐音与“宽度”空间概念之间的类比抽象,两者都不同于音乐与意义之间的标志性相似性音乐般的“叹息”。因此,详细检查 Koelsch 等人的刺激并开发音乐节选传达语义的方式的分类法将很有趣。如果可以为该分类法的每个类别生成足够的新刺激,人们可以进行 ERP 研究,看看哪种类型的关联最能推动 N400 效应。据推测,正是这些关联真正激活了语义概念。
Koelsch et al.’s findings raise interesting questions. First of all, it is evident that in this study the types of associations between musical excerpts and target words are rather heterogeneous. For example, conventional associations between Middle Eastern music and images of caravans are much less abstract than analogies between pitch intervals/consonance and the spatial concept of “wideness,” and both are different from the iconic resemblance between music and meaning represented by the excerpt of a musical “sigh.” Thus it would be interesting to examine Koelsch et al.’s stimuli in detail and develop a taxonomy of ways in which the musical excerpts conveyed semantic meanings. If sufficient new stimuli could be generated for each category of this taxonomy, one could do ERP studies to see which types of association most strongly drive the N400 effect. Presumably it is these associations that are truly activating semantic concepts.
图 6.5由与前面的句子 (A) 或前面的器乐片段 (B) 在语义上无关的目标词引发的 N400 效应的神经生成器。估计的神经发生器(在颞叶中显示为白色偶极源)在语言和音乐条件之间没有差异(x、y 和 z 坐标指标准立体空间,偶极矩 [q] 纳安) . 改编自 Koelsch 等人,2004 年。
Figure 6.5 Neural generators of the N400 effect elicited by target words that were semantically unrelated to preceding sentences (A), or to preceding instrumental musical excerpts (B). The estimated neural generators (shown as white dipole sources in the temporal lobes) did not differ between the language and the music conditions (x-, y-, and z-coordinates refer to standard stereotaxic space, dipole moments [q] in nanoamperes). Adapted from Koelsch et al., 2004.
退一步说,虽然 Koelsch 等人的研究。表明音乐可以激活某些语义概念,但这绝不表明音乐具有与语言同等的语义系统。首先,由音乐激活的语义概念的特殊性可能比由语言激活的要低得多(并且在各个听众之间变化更大)。其次,没有证据表明音乐具有语义组合性,换言之,没有通过成分的结构化组合来表达复杂语义的系统(Partee,1995)。Koelsch 的研究表明,音乐完全没有语义意义的说法(参见 Kivy,2002 年)过于强烈:音乐和语言之间的语义边界不是绝对的,而是分级的。
Taking a step back, although the study of Koelsch et al. suggests that music can activate certain semantic concepts, it by no means shows that music has a semantic system on par with language. First of all, the specificity of semantic concepts activated by music is likely to be much lower (and much more variable between individual listeners) than activated by language. Secondly, there is no evidence that music has semantic compositionality, in other words, a system for expressing complex semantic meanings via structured combinations of constituents (Partee, 1995). What Koelsch’s study shows is that the claim that music is absolutely devoid of semantic meaning (cf. Kivy, 2002) is too strong: The semantic boundary between music and language is not categorical, but graded.
在结束本节之前,值得注意一个有趣的事实,即迄今为止还没有已发表的研究表明音乐本身可以引发 N400。尽管有许多实验研究大脑对熟悉和不熟悉的音乐中不协调的音符或和弦的反应(从 Besson 和 Macar,1987 年的开创性工作开始),但没有人找到 N400。一项表明音乐本身(即作为上下文和目标)可以产生 N400 的研究将引起相当大的兴趣,因为它会为“音乐语义学”的研究和 N400 ERP 组件的认知基础研究提供信息(参见 Miranda & Ullman,出版中)。在这方面,值得注意的是,和弦序列末尾的走调和弦会在和弦开始后 500 毫秒左右引发晚期消极情绪峰值(例如,Koelsch et al., 2000 报道的“N5”;比照。Johnston, 1994),但该成分的时间和头皮分布表明它与语言 N400 不同。研究这种晚期消极性是否与 N400 共享任何神经机制的一种方法是进行 ERP 实验,将句子与和弦序列结合起来,以便语义不一致与调外和弦同时发生(参见 Steinbeis & Koelsch,在新闻)。如果大脑信号来自独立的过程,那么它们应该以加法的方式组合,而如果它们利用相似的神经机制,它们应该相互作用(cf. 研究这种晚期消极性是否与 N400 共享任何神经机制的一种方法是进行 ERP 实验,将句子与和弦序列结合起来,以便语义不一致与调外和弦同时发生(参见 Steinbeis & Koelsch,在新闻)。如果大脑信号来自独立的过程,那么它们应该以加法的方式组合,而如果它们利用相似的神经机制,它们应该相互作用(cf. 研究这种晚期消极性是否与 N400 共享任何神经机制的一种方法是进行 ERP 实验,将句子与和弦序列结合起来,以便语义不一致与调外和弦同时发生(参见 Steinbeis & Koelsch,在新闻)。如果大脑信号来自独立的过程,那么它们应该以加法的方式组合,而如果它们利用相似的神经机制,它们应该相互作用(cf.第 5 章,第 5.4.4 节,“语言和音乐句法处理之间的干扰”小节,了解该策略应用于句法处理的示例)。
Before closing this section, it is worth noting the interesting fact that to date there are no published studies showing that music alone can elicit an N400. Although there have been many experiments examining brain responses to incongruous notes or chords in both familiar and unfamiliar music (starting with the pioneering work of Besson & Macar, 1987), none have found an N400. A study showing that music alone (i.e., both as context and target) can produce an N400 would be of considerable interest, because it would inform both the study of “musical semantics” and the study of the cognitive basis of the N400 ERP component (cf. Miranda & Ullman, in press). In this regard, it is worth noting that out-of-key chords at the ends of chord sequences can elicit a late negativity peaking around 500 ms after the onset of the chord (e.g., the “N5” reported by Koelsch et al., 2000; cf. Johnston, 1994), but the timing and scalp distribution of this component indicates that it is not the same as the linguistic N400. One way to investigate whether this late negativity shares any neural mechanisms with the N400 is to do ERP experiments that combine sentences with chord sequences, so that semantic incongruities occur at the same time as out-of-key chords (cf. Steinbeis & Koelsch, in press). If the brain signals come from independent processes, then they should combine in an additive fashion, whereas if they draw on similar neural mechanisms, they should interact (cf. Chapter 5, section 5.4.4, subsection “Interference Between Linguistic and Musical Syntactic Processing” for an example of this strategy applied to syntactic processing).
一个人从听语言中获得的意义不仅仅是单个单词和句子的意义。例如,下面示例 6.1 中的每个句子都结构良好且有意义,但听众可能会根据其作为话语的含义来判断该段落不连贯。
The meaning a person derives in listening to language is more than just the meaning of individual words and sentences. For example, each sentence in example 6.1 below is perfectly well formed and meaningful, but a listener would likely judge the passage incoherent in terms of its meaning as a discourse.
(6.1) 父亲看到他的儿子拿起他的玩具电锯。贝壳的内部通常是闪亮的。约翰喜欢豌豆。
(6.1) The father saw his son pick up his toy chainsaw. Seashells are often shiny on the inside. John likes peas.
相比之下,6.2 中的句子可能被认为是连贯的。
In contrast, the sentences in 6.2 are likely to be perceived as coherent.
(6.2) 父亲看到他的儿子拿起他的玩具电锯。男孩假装砍倒了一棵树,却没有去碰花园里那些娇嫩的花朵。妈妈很高兴。
(6.2) The father saw his son pick up his toy chainsaw. The boy pretended to cut down a tree, but didn’t touch the delicate flowers growing in the garden. Mom was pleased.
6.2 的感知连贯性说明了关于从话语中获取意义的一些经常未被重视的事实。首先,它涉及假设未陈述的信息(例如,第二句中的“男孩”与第一句中的“儿子”是同一个人;“妈妈”是同一个男孩的母亲;“妈妈”是父亲的妻子)。其次,它还涉及对所说的话进行推断(例如,妈妈很高兴因为男孩没有砍掉她的花)。语言使用者习惯于添加上下文信息并做出推断,以至于他们通常意识不到此类过程正在发生。然而,建立语篇连贯性是语言理解的核心。正如 Kehler (2004) 所指出的,
The perceived coherence of 6.2 illustrates some often unappreciated facts about deriving meaning from discourse. First, it involves assuming unstated information (e.g., that “the boy” in the second sentence is the same individual as “the son” in the first sentence; that “mom” is the mother of that same boy; that “mom” is the wife of the father). Second, it also involves drawing inferences about what was said (e.g., that mom was pleased because the boy didn’t cut down her flowers). Language users are so accustomed to adding contextual information and making inferences that they are generally unaware that such processes are taking place. Nevertheless, establishing discourse coherence is central to language understanding. As noted by Kehler (2004), “just as hearers attempt to recover the implicit syntactic structure of a string of words to compute sentence meaning, they attempt to recover the implicit coherence structure of a series of utterances to compute discourse meaning” (p. 243).
研究听众如何将语境信息添加到语义结构中,以及他们如何对所说的内容进行推断,这称为语用学。语言学家将语用学与语义学区分开来:后者侧重于单词和命题的含义,而前者侧重于听者如何根据上下文信息和推理来恢复说话者的意图(Jaszczolt,2002)。语用学研究的一个核心问题是建立听者在话语之间建立的概念联系类型。与语言学的大多数研究领域一样,关于这个问题有多种理论。出于本书的目的,我将重点关注一种以关注心理合理性而著称的特定理论。
The study of how listeners add contextual information to semantic structure and how they draw inferences about what has been said is called pragmatics. Linguists distinguish pragmatics from semantics: The latter focuses on the meanings of words and propositions, whereas the former focuses on how hearers recover a speaker’s intended meaning based on contextual information and inferencing (Jaszczolt, 2002). A central concern for research in pragmatics is establishing the types of conceptual connections a listener makes between utterances. As with most areas of research in linguistics, there are multiple theories regarding this issue. For the purposes of this book, I will focus on one particular theory that is notable for its concern with psychological plausibility. (It is also notable for its success in accounting for linguistic phenomena that are difficult to explain in other frameworks, though there is not space here to go into this aspect.)
这是 Kehler (2002) 的理论。借鉴大卫·休谟 (David Hume) 在其 1748 年关于人类理解的探究中的哲学著作,Kehler 的理论认为,听者在话语之间建立三种主要类型的联系:相似性、因果关系和连续性。重要的是,休谟的思想不是在语言话语研究的背景下发展起来的,而是作为对人类可以理解的“思想之间的联系”类型的哲学研究的一部分。因此,凯勒理论的核心思想是,语言中的连贯关系是人类为了理解事件序列而应用的更基本认知过程的实例。(Kehler 与其他几位语言学家分享了这一观点,包括 Hobbs,1990,他首先提请注意语篇连贯关系可以根据休谟的系统分类这一事实。)具体来说,不同的类别被认为代表了人类思维进行推论的三种基本方式。相似关系基于类比推理、对事件进行分类并查看它们之间对应关系的能力。因果关系基于绘制事件之间的蕴涵路径。邻接关系基于对事件发生在特定区域的理解次序,并反映了关于事物在一般情况下发生的顺序的知识。
This is the theory of Kehler (2002). Drawing on the philosophical work of David Hume in his 1748 Enquiry Concerning Human Understanding, Kehler’s theory posits that there are three broad types of connections that listeners make between utterances: resemblance, cause-effect, and contiguity. Importantly, Hume’s ideas were not developed in the context of the study of linguistic discourse, but as part of a philosophical investigation of the types of “connections among ideas” that humans can appreciate. Thus central to Kehler’s theory is the idea that coherence relations in language are instantiations of more basic cognitive processes that humans apply in order to make sense of sequences of events. (Kehler shares this perspective with several other linguists, including Hobbs, 1990, who first drew attention to the fact that discourse coherence relations could be classified according to Hume’s system.) Specifically, the different categories are thought to represent three basic ways in which human minds draws inferences. Resemblance relations are based on the ability to reason analogically, categorizing events and seeing correspondences between them. Cause-effect relations are based on drawing a path of implication between events. Contiguity relations are based on understanding that events happen in a certain order, and reflects knowledge about the sequence in which things happen under ordinary circumstances.
如果语言话语中的连贯性感知背后存在一般认知过程,那么这些相同的过程是否适用于音乐中的连贯性感知?也就是说,常见的心理机制是否适用于解释语言和音乐的连贯关系?这个问题是明智的一个原因是,与语言一样,对音乐连贯性的感知需要的不仅仅是识别独立的、结构良好的音乐片段:它需要感知片段之间的联系,将片段链接成一个整体的联系。更大,有组织的整体。当然,这引出了所讨论的音乐“片段”是什么的问题。在语言的情况下,一个明显的话语片段候选者是一个从句,尽管存在其他可能性(参见 Wolf & Gibson,2005)。在音乐方面,定义相关细分市场并不是那么简单。特别是,如果为了研究连贯关系而想将音乐分割成不重叠的单元,这些单元应该有多大?短主题?中等长度的短语?长主题?整个部分?为了论证,我假设普通音乐聆听的相关级别可能涉及相当短的单元,可能与音乐短语或主题的顺序相同(参见 Levinson,1997)。
If there are general cognitive processes underlying the perception of coherence in linguistic discourse, might these same processes apply to the perception of coherence in music? That is, do common mental mechanisms apply in the interpretation of coherence relations in language and music? One reason that this question is sensible is that as with language, the perception of coherence in music requires more than the recognition of independent, well-formed musical segments: It requires that connections be perceived between the segments, connections that link the segments into a larger, organized whole. Of course, this begs the question of what the musical “segments” in question are. In the case of language, an obvious candidate for a discourse segment is a clause, though other possibilities exist (see Wolf & Gibson, 2005). In music, defining the relevant segments is not so straightforward. In particular, if one wants to segment music into nonoverlapping units for the purpose of studying coherence relations, how big should these units be? Short motifs? Middle-length phrases? Long themes? Entire sections? For the sake of argument, I will assume that the relevant level for ordinary musical listening is likely to involve fairly short units, perhaps on the order of musical phrases or themes (cf. Levinson, 1997).
我们现在可以更尖锐地提出我们的问题:语言话语中从句之间的连贯关系是否类似于音乐话语中短语和主题之间的连贯关系?为了解决这个问题,让我们以 Kehler (2002) 作为资料来源,检验语言学理论提出的一些特定的连贯关系。在“相似”这一范畴中,凯勒列出了六种关系,其中三种从比较语言-音乐研究的角度来看是很有趣的:平行(相似)、对比和阐述。这些关系的语言示例分别在示例 6.3-6.5 中给出。(以下所有示例均改编自 Wolf 和 Gibson,2005 年。括号表示话语片段。)
We can now pose our question more pointedly: Do coherence relations between clauses in linguistic discourse have analogs to coherence relations between phrases and themes in musical discourse? In order to address this question, let us examine some specific coherence relations posited by linguistic theory, using Kehler (2002) as a source. In the category of “resemblance” Kehler lists six relations, three of which are of interest from the standpoint of comparative language-music research: parallelism (similarity), contrast, and elaboration. Linguistic examples of these relations are given in examples 6.3-6.5, respectively. (All examples below are adapted from Wolf and Gibson, 2005. Brackets indicate discourse segments.)
(6.3) [有一个双簧管靠在黑色的乐谱架上。] [有另一个双簧管靠在灰色的乐谱架上。]
(6.3) [There is a an oboe leaning on the black music stand.] [There is another oboe leaning on the gray music stand.]
(6.4) [约翰喜欢莎莎音乐,] [但苏珊喜欢雷鬼音乐。]
(6.4) [John liked salsa music,] [but Susan liked reggae.]
(6.5) [本周推出了一个新的音乐会系列。] [“斯特拉文斯基回顾展”计划持续到 12 月。]
(6.5) [A new concert series was launched this week.] [The “Stravinsky retrospective” is scheduled to last until December.]
从这些句子的语义内容中抽象出来,每个句子都体现了音乐学家认为是音乐基础的连贯关系。一个音乐乐句/主题可以与另一个乐句/主题明显相似,与之形成对比,或对其进行阐述。
Abstracting away from the semantic content of these sentences, each sentence exemplifies a coherence relation recognized by musicologists as basic to music. A musical phrase/theme can be recognizably similar to another phrase/theme, provide a contrast to it, or elaborate it.
转向“因果”类别,Kehler 确定了四种连贯关系,其中两种是比较研究感兴趣的:“结果”和“违背期望”。示例 6.6 和 6.7 中给出了这些示例。
Turning to the category of “cause-effect,” Kehler identifies four coherence relations, two of which are of interest to comparative research: “result” and “violated expectation.” Examples of these are given in examples 6.6 and 6.7.
(6.6) [机场天气不好,] [所以我们的航班延误了]。
(6.6) [There was bad weather at the airport,] [and so our flight got delayed].
(6.7) [天气很好,] [但是我们的航班延误了]。
(6.7) [The weather was nice,] [but our flight got delayed].
同样,必须从语义中抽象出来才能看到与音乐的关系。回想一下,因果关系涉及在话语片段之间绘制暗示路径的过程。音乐事件可以通过暗示或预期与其他事件相关的概念是西方音乐理论的基础(例如,Meyer,1956 年;Narmour,1990 年)。这些力量在和声和弦序列的感知中发挥着特别强烈的作用,在和弦进行中,和弦进行通过早期和弦(设定期望)和后续和弦(满足或否定这些期望)之间的对比来创造连贯性。因此,将音乐片段视为满足或违反先前片段所创造的期望与我们对音乐的了解完全一致。
Again, one must abstract away from semantics to see the relationship to music. Recall that cause-effect relations concern the process of drawing a path of implication between discourse segments. The notion that musical events can be related by implication or expectation to other events is fundamental to Western music theory (e.g., Meyer, 1956; Narmour, 1990). These forces are particularly strongly at play in the perception of harmonic chord sequences, in which chord progressions create coherence via the contrast between earlier chords (which set up expectations) and subsequent chords (which either fulfill or deny these expectations). Thus it is perfectly congruent with what we know about music to think of musical segments as fulfilling or violating expectations created by previous segments.
在 Kehler 使用的最后一个类别“邻接”中,只有一种相干关系。Kehler 将这种关系称为“场合”,但“时间序列”的概念(由 Wolf & Gibson,2005 年使用)是其中的一个子集,足以满足我们当前的目的。6.8 给出了一个例子。
In the final category used by Kehler, “contiguity,” there is only one coherence relation. Kehler refers to this relation as “occasion,” but the notion of “temporal sequence” (used by Wolf & Gibson, 2005) is a subset of this and is adequate for our current purposes. An example is given in 6.8.
(6.8) [罗杰洗了个澡。] [他去睡觉了]。
(6.8) [Roger took a bath.] [He went to bed].
使 6.8 连贯的部分原因是人们对这些事件通常以给定顺序发生的世界知识。这种关系在音乐中有什么相似之处吗?最可能的相似之处是听众对特定音乐形式的了解,其中一系列主题或模式以特定顺序出现,这不是因为任何内在的内在逻辑,而仅仅是因为这是文化指定的顺序(参见 Meyer, 1956:128)。
Part of what makes 6.8 coherent is one’s world knowledge that these events typically occur in a given order. Does this relation have any parallel in music? The most likely parallel is a listener’s knowledge of specific musical forms in which a succession of themes or patterns occurs in a particular order, not because of any intrinsic internal logic but simply because that is the order that the culture has specified (cf. Meyer, 1956:128).
如果我们承认语言和音乐之间存在重叠的连贯关系,如前几段所暗示的,那么我们如何将这一观察转化为比较实证研究?Wolf 和 Gibson (2005) 的工作提出了一个有前途的方向。这些研究人员开发了一个文本连贯关系的注释系统,主要基于 Hobbs (1985) 和 Kehler (2002) 提出的关系。他们的系统有八个关系,如表 6.1所示。我已经用“M”表示了那些与假定的音乐相似之处的关系。(请注意,Wolf 和 Gibson 的因果关系类似于 Kehler 的“结果”。)
If one accepts that there are overlapping coherence relations between language and music, as suggested by the previous paragraphs, how is one to transform this observation into comparative empirical research? One promising direction is suggested by the work of Wolf and Gibson (2005). These researchers have developed an annotation system for coherence relations in text, based largely on the relations suggested by Hobbs (1985) and Kehler (2002). Their system has eight relations, shown in Table 6.1. I have indicated those relations with putative musical parallels with an “M.” (Note that Wolf and Gibson’s cause-effect relation is akin to Kehler’s “result.”)
Wolf 和 Gibson 的注释系统涉及将文本分割成语篇片段(从句),将主题相关的片段组合在一起,然后确定片段之间的连贯关系。这些关系被描绘成话语片段之间的弧线。例如,6.9 中所示的文本被分割成文段,用数字 1-4 表示:
The annotation system of Wolf and Gibson involves segmenting a text into discourse segments (clauses), grouping topically related segments together, and then determining coherence relations between segments. These relations are diagrammed as arcs between discourse segments. For example, the text shown in 6.9 is segmented into discourse segments, indicated by the numbers 1-4:
T$able 6.1 Coherence Relations Between Discourse Segments
(6.9)
(6.9)
1. 苏珊想买些西红柿
1. Susan wanted to buy some tomatoes
2.她还试图找到一些罗勒
2. and she also tried to find some basil
3. 因为她的食谱需要这些配料。
3. because her recipe asked for these ingredients.
4. 罗勒在每年的这个时候可能会很贵。
4. The basil would probably be quite expensive at this time of the year.
这些段之间的相干关系如图 6.6 所示。
The coherence relations between these segments are indicated in Figure 6.6.
Wolf 和 Gibson 推导出此示例分析如下。“ 1和2之间存在相似关系;1 和 2 都描述了购买杂货。3和1-2之间存在因果关系;3 描述了 1 和 2 描述的购物的原因。4 和 2 之间存在详细说明关系;4 提供了有关 2” 中罗勒的详细信息(第 265-266 页)。
Wolf and Gibson derive this example analysis as follows. “There is a similarity relation between 1 and 2; 1 and 2 both describe shopping for grocery items. There is a cause-effect relation between 3 and 1-2; 3 describes the cause for the shopping described by 1 and 2. There is an elaboration relation between 4 and 2; 4 provides details about the basil in 2” (pp. 265-266).
Wolf 和 Gibson 在他们的系统中训练了两个注释器,每个注释器都独立注释了 135 个文本(这些文本平均有 61 个话语片段和 545 个单词)。他们在注释者之间找到了很好的一致性 (~90%),并继续进行了一些有趣的观察。其中之一是存在许多“交叉依赖”的实例(例如,图 6.6中“ce”和“elab”线的交叉),换句话说,用于分析语法的传统嵌套层次树结构无法很好地捕获依赖关系。这表明连贯关系具有与句法结构不同的心理组织。另一个发现是某些关系比其他关系更为普遍。例如,“详细说明”是一个非常频繁的关系,几乎占关系的 50%,而违反预期则很少见,约占关系的 2%。当然,需要来自更多注释者和文本的数据,但从认知的角度来看,即使是这些初步发现也非常有趣。
Wolf and Gibson trained two annotators in their system and had each independently annotate 135 texts (the texts had an average of 61 discourse segments and 545 words each). They found good agreement between the annotators (~90%), and went on to make several interesting observations. One of these was that there were many instances of “crossed dependencies” (e.g., the crossing of the “ce” and “elab” lines in Figure 6.6), in other words, dependencies that are not well captured by the traditional nested hierarchical tree structures used to analyze syntax. This suggests that coherence relations have a different mental organization than syntactic structures. Another finding was that certain relations are much more common than others. For example, “elaboration” was a very frequent relation, accounting for almost 50% of relations, whereas violated expectation was quite rare, accounting for about 2% of relations. Of course, data from more annotators and texts are needed, but even these preliminary findings are quite intriguing from a cognitive standpoint.
从比较语言-音乐研究的角度来看,Wolf 和 Gibson 的系统很有趣,因为它产生连贯结构图(例如图 6.6中的)从特定语义内容中抽象出来,并且可以使用图论原理从纯形式的角度研究其拓扑结构(Wilson,1985)。也就是说,可以量化图形架构的各个方面。例如,可以量化所有段的平均“入度”(即,到段的传入连接的平均数)、与不同相干关系相关联的平均弧长等。由此产生的数字提供了语言话语结构的定量特征。
From the standpoint of comparative language-music research, Wolf and Gibson’s system is of interest because it produces coherence structure graphs (such as that in Figure 6.6) that are abstracted away from specific semantic content, and whose topology can be studied from a purely formal standpoint using principles of graph theory (Wilson, 1985). That is, one can quantify aspects of graph architecture. For example, one can quantify the mean “in-degree” of all segments (i.e., the mean number of incoming connections to a segment), the mean arc length associated with different coherence relations, and so forth. The resulting numbers provide a quantitative characterization of linguistic discourse structure.
因为 Wolf 和 Gibson 使用的许多关系在音乐中都有相似之处(表 6.1),所以可以想象对音乐作品进行类似的分析,得到类似于图 6.6的图表. 然后可以使用与语言连贯图相同的度量来量化这些图的拓扑结构,然后在定量框架中检查语言和音乐话语结构之间的异同。例如,知道生成与语言图在数量上相似的图的音乐作品是否比拓扑模式与语言模式截然不同的音乐作品被认为在感知上更连贯,这将是非常有趣的。如果是这种情况,有人可能会争辩说,不仅相似的认知原则在组织语言和音乐话语中的意义流方面发挥作用,而且这种流的模式是由相似的力量塑造的(例如,可能是有限的处理资源) .
Because so many of the relations used by Wolf and Gibson have analogs in music (Table 6.1), one can imagine conducting similar analyses of musical pieces, resulting in graphs comparable to that in Figure 6.6. One could then quantify the topology of these graphs using the same measures as applied to linguistic coherence graphs, and then examine similarities and differences between the architecture of linguistic and musical discourse in a quantitative framework. It would be very interesting to know, for example, if musical pieces that generated graphs quantitatively similar to linguistic graphs were perceived as more perceptually coherent than musical pieces with topological patterns very different from linguistic patterns. Were this the case, one might argue that not only are similar cognitive principles at play in organizing the flow of meaning in linguistic and musical discourse, but that the patterning of this flow is shaped by similar forces (e.g., perhaps by limited processing resources).
正如存在对语言句法和语义至关重要的大脑区域一样,似乎有理由期望存在对语言话语理解中涉及的推理过程至关重要的大脑区域,换句话说,将各个部分连接成更大、更有意义的过程所有的。这些大脑区域位于何处?神经心理学有大量证据表明,这些区域位于右脑半球,可能与左半球语言区域同源(Beeman,1993 年,1998 年)。例如,在一项有影响力的早期研究中,Brownell 等人。(1986) 向右半球受损的患者和正常人展示了最简单的连接话语形式:成对的句子。要求参与者尝试将这些句子想象成一个完整的故事,然后被问到关于这个故事的是非题。示例 6.10 和 6.11 中给出了一个例句对及其相关问题:
Just as there are brain regions that are critical to linguistic syntax and semantics, it seems reasonable to expect that there are brain regions critical to the inferencing processes involved in linguistic discourse comprehension, in other words, processes that connect individual segments into a larger, meaningful whole. Where do these brain regions reside? There is a good deal of evidence from neuropsychology that these regions are in the right cerebral hemisphere, possibly in homologs of left hemisphere language areas (Beeman, 1993, 1998). For example, in an influential early study, Brownell et al. (1986) presented right-hemisphere-damaged patients and normal individuals with the simplest form of connected discourse: pairs of sentences. Participants were instructed to try to think of the sentences as a complete story, and were then asked true/false questions about the story. An example sentence pair and its associated questions are given in examples 6.10 and 6.11:
图 6.6短篇文章片段之间的连贯关系(文中例 6.9)。在这张图中,“sim”=“相似”,“ce”=“因果关系”,“elab”=详细说明。来自 Wolf & Gibson,2005 年。
Figure 6.6 Coherence relations between segments of a short discourse (example 6.9 in the text). In this diagram, “sim” = “similar,” “ce” = “cause-effect,” and “elab” = elaboration. From Wolf & Gibson, 2005.
(6.10) 芭芭拉变得无聊到无法完成这本历史书。她已经花了五年时间写这本书。
(6.10) Barbara became too bored to finish the history book. She had already spent five years writing it.
事实问题作为控制条件包括在内,以测试简单的记忆能力。与事实问题相比,正常人和患者在推理问题上的难度更大,但关键的结果是,右半球受损的患者在推理问题和事实问题上的表现之间存在显着更大的差异。当然,因为 Brownell 等人。没有测试左半球损伤的患者,他们的结果不能排除脑损伤本身的影响(相对于右半球局部损伤)。然而,随后对更多患者的研究提供了证据,表明右半球在语言推理过程中发挥着特殊作用(参见 Beeman,1998 年的综述)。
The factual questions were included as a control condition to test for simple memory abilities. Both normal individuals and patients had greater difficulty with the inference questions than the factual questions, but the critical result was that the right-hemisphere-damaged patients had a significantly greater discrepancy between performance on the inference questions and fact questions. Of course, because Brownell et al. did not test patients with left-hemisphere damage, their result could not rule out the effect of brain damage per se (vs. damage localized to the right hemisphere). Subsequent work on more patients has provided evidence, however, that the right hemisphere has a special role to play in linguistic inferencing processes (see Beeman, 1998, for a review). Functional neuroimaging work with normal individuals has been less conclusive regarding hemispheric laterality, with some studies favoring a strong right-hemisphere laterality (e.g., Mason & Just, 2004), and others favoring the involvement of regions on both sides of the brain (e.g., Kuperberg et al., 2006).
在前面的小节中,提出了语言和音乐中话语连贯性感知背后的心理过程之间的相似之处。解决这些相似之处是否真的反映了常见的神经处理的一种方法是测试有语言推理问题的患者,这些音乐任务探究音乐中连贯关系的感知。例如,可以使用“加扰音乐”方法,其中将音乐片段分成局部连贯的短片段(例如,音乐短语/主题),并重新排列这些片段的顺序(例如,使用方法和刺激Lallite & Bigand,2006 年,参见第 6.2.1 节). 然后,人们可以根据加扰量得出对音乐连贯性的判断。(与 Brownell 等人的研究一样,拥有测试记忆问题的控制条件很重要。)与正常人相比,有语言推理问题的患者对音乐加扰相对不敏感吗?控制?如果是这样,这将表明在感知语言和音乐的连贯关系的心理过程中存在共性。
In the previous subsection, parallels were suggested between the mental processes underlying the perception of discourse coherence in language and music. One way to address whether these parallels actually reflect common neural processing is to test patients with problems with linguistic inferencing on musical tasks that probe the perception of coherence relations in music. For example, one could use the “scrambled music” method, in which musical pieces are divided into short segments that are locally coherent (e.g., musical phrases/themes) and the order of these segments is rearranged (e.g., using the method and stimuli of Lallite & Bigand, 2006, cf. section 6.2.1). One could then elicit judgments of musical coherence as a function of the amount of scrambling. (As with the Brownell et al. study, it would be important to have control conditions that test for memory problems.) Would patients with linguistic inferencing problems be relatively insensitive to musical scrambling compared to normal controls? If so, this would suggest commonalities in the mental processes underlying perception of coherence relations in language and music.
与本书的其余部分一样,本章的重点是口语和器乐之间的关系。尽管如此,还是值得简要地谈谈歌曲,其中音乐和语言的意义交织在一起。这种相互作用的一种简单形式是“文字绘画”,它与色调绘画有关(参见第 6.2.5 节)). 通过文字绘画,作曲家可以尝试通过使用象征性地反映词义某些方面的音调模式来在音乐上补充词义。例如,在灵性歌曲“Swing Low, Sweet Chariot”的第一句中,旋律从“swing”向下跳跃(4 个半音)到“low”,因此使用音高运动来反映向下扫过的意思到地球,而第二个短语“comin' for to carry me home”具有上升的音调轮廓,反映了上升到天堂。另一个文字绘画的例子,这次来自摇滚音乐,是甲壳虫乐队歌曲“艰难的一天之夜”的开场和弦。这个和弦基于一个正常的、和谐的 G 大调三重奏,换句话说,音符 GBD(歌曲开场调的主音三重奏),但添加了两个额外的音调:第 5 章)。和弦中 B 和 C 的组合产生了不和谐音,这与不稳定的 F-natural 一起赋予了这个和弦一种不和谐的品质,反映了超越一个人的极限的迷失方向(Stevens,2002:52),补充了歌曲的标题(这是它的开场白)。
As with the rest of this book, the focus of this chapter is on the relationship between spoken language and instrumental music. Nevertheless, it is worth touching briefly on song, in which musical and linguistic meanings intertwine. One simple form of this interplay is “word painting,” which is related to tone painting (cf. section 6.2.5). With word painting, a composer can try to musically complement the meaning of words by using tonal patterns that iconically reflect some aspect of word meaning. For example, in the first phrase of the spiritual “Swing Low, Sweet Chariot,” the melody takes a downward leap (of 4 semitones) from “swing” to “low,” thus using pitch movement to reflect the idea of something sweeping downward to earth, whereas the second phrase “comin’ for to carry me home” has a rising pitch contour, reflecting an ascent to heaven. Another example of word painting, this time from rock music, is the opening chord of the Beatles’ song “A Hard Day’s Night.” This chord is based on a normal, harmonious-sounding G major triad, in other words, the notes G-B-D (the tonic triad of the opening key of the song), but with two additional tones added: a C (the 4th degree of the scale), and F-natural (which is not in the scale of G major, and is thus highly unstable; cf. Chapter 5). The combination of B and C in the chord creates a dissonance, and this together with the unstable F-natural gives this chord a jarring quality that reflects the disorientation of going beyond one’s limits (Stevens, 2002:52), complementing the meaning of the song’s title (which is its opening line).
音乐可以超越文字绘画,以更复杂的方式创造与语言意义交织在一起的意义。例如,和声句法的各个方面可用于暗示补充或矛盾文本含义的含义(Cone,1974 年;Youens,1991 年;尤其参见 Zbikowski,1999 年关于文本-音乐交互的现代音乐理论方法)受认知科学启发)。这是因为和声句法可以阐明紧张点和解决点(即不稳定与稳定)以及开放点和封闭点。例如,在巴赫的赞美诗“Aus tiefer Noth schrei ich zu dir”(基于诗篇 130)中,情感的高潮出现在第三行,当说话者问道,“如果你,主啊,记录了罪恶,主啊,谁受得了?” 这是诗篇中唯一的问题(几行之后结束),但巴赫似乎在创建和声设置时专注于这一行。主歌的原始旋律早于巴赫,并且以 E 调为中心。不过,巴赫将这段旋律融入了以 A 小调为中心的和声设置,从而使 E 及其相关的和弦发挥作用显性 (V) 的功能。在调性音乐中,V 和弦传达了一种不完整的感觉,因为音乐的结束点通常由从 V 到 I 的移动表示,换句话说,由节奏表示(I 是调的主音和弦;参见第 5 章) . 值得注意的是,巴赫以 E 大调和弦结束了赞美诗,因此赞美诗没有以节奏结束,而是保持和声开放,象征着等待回应的问题。于是,巴赫通过和声,从诗篇的语义意义和精神意义中挑出了一部分,赋予了它音乐的声音。
Music can go beyond word painting to create meanings that intertwine with linguistic meaning in a more sophisticated fashion. For example, aspects of harmonic syntax can be used to imply meanings that either complement or contradict the meaning of a text (Cone, 1974; Youens, 1991; and see especially Zbikowski, 1999 for a modern music-theoretic approach to text-music interactions inspired by cognitive science). This is because harmonic syntax can articulate points of tension and resolution (i.e., instability vs. stability), and points of openness and closure. For example, in the Bach chorale “Aus tiefer Noth schrei ich zu dir” (based on Psalm 130), the emotional climax occurs in the third line, when the speaker asks, “If you, O Lord, kept a record of sins, O Lord, who could stand?” This is the only question in the psalm (which ends several lines later), but it seems that Bach focuses on this line in creating the harmonic setting. The original melody of the chorale predates Bach and is tonally centered on E. Nevertheless, Bach incorporates this melody into a harmonic setting that is tonally centered on A minor, thus making the E and its associated chord serve the function of the dominant (V). In tonal music, the V chord conveys a sense of incompleteness because points of musical closure are often indicated by a movement from V to I, in other words, by a cadence (I is the tonic chord of a key; cf. Chapter 5). Remarkably, Bach ends the chorale on an E major chord, so that the chorale does not conclude with a cadence but is left harmonically open, emblematic of a question in wait of a response. Thus by using harmony, Bach has picked out a part of the psalm’s semantic meaning and spiritual significance, and given it a musical voice.
尽管歌曲中语言和音乐意义的相互作用一直是音乐理论家的沃土,但这一领域的实证研究却很少。大多数现有工作只是简单地询问听众是否发现将歌词嵌入其原始音乐背景中时更有意义,答案通常是“是”(Galizio & Hendrick, 1972; Iverson et al., 1989; Stratton & Zalanowski, 1994) . Thompson & Russo (2004) 进行了一项此类研究并得出了有趣的结果。他们使用了本科生参与者不熟悉的 1970 年代的热门歌曲。歌词以口头或伴有音乐伴奏的形式呈现。在一项实验中,参与者评估歌词传达快乐或悲伤信息的程度。在第二个实验中,听众听到陌生和熟悉的歌曲,并对歌词的意义进行评价。(这些评分反映了歌词被认为“信息丰富、巧妙、新颖、产生强烈和多重联想并且具有说服力”的程度。)仅一次曝光或五次曝光后。重复接触组在阅读杂志或书籍时听到背景音乐。不同的听众组在接触一次或接触五次后对不熟悉的歌词/歌曲的意义进行评级。重复接触组在阅读杂志或书籍时听到背景音乐。不同的听众组在接触一次或接触五次后对不熟悉的歌词/歌曲的意义进行评级。重复接触组在阅读杂志或书籍时听到背景音乐。
Although the interplay of linguistic and musical meaning in songs has been fertile ground for music theorists, there has been little empirical research in this area. Most of the existing work simply asks whether listeners find lyrics more meaningful when they are embedded in their original musical context, the answer typically being “yes” (Galizio & Hendrick, 1972; Iverson et al., 1989; Stratton & Zalanowski, 1994). Thompson & Russo (2004) conducted a study of this sort with interesting results. They used hit songs from the 1970s that were unfamiliar to their undergraduate participants. The lyrics were presented as spoken or as sung with a musical accompaniment. In one experiment, participants rated the extent to which the lyrics conveyed a happy or sad message. In a second experiment, listeners heard both unfamiliar and familiar songs and rated the meaningfulness of the song lyrics. (These ratings reflected the extent to which the lyrics were perceived as “informative, artful, novel, generating strong and multiple associations, and are persuasive.”) In a third experiment, separate groups of listeners rated the meaningfulness of unfamiliar lyrics/songs after just one exposure or after five exposures. The repeated exposure group heard the songs in the background while they read magazines or books.
在第一个实验中,Thompson 和 Russo 发现音乐背景会影响歌词的感知情感效价。例如,保罗·西蒙 (Paul Simon) 的“Kodachrome”的歌词相当令人向往,在其音乐伴奏的背景下听到时,被认为表达了明显更积极的情感,这是非常乐观的。14在他们的第二项研究中,他们发现熟悉歌曲的歌词在伴有音乐时被认为更有意义,但(有点令人惊讶)不熟悉歌曲的歌词则不然。在最后的研究中,他们发现反复接触歌曲背景会导致对相关歌词意义的更高评价。总之,实验 2 和 3 的结果表明,仅仅是对音乐的熟悉会让听众相信音乐增强了歌词的含义。就好像音乐在符号学上是千变万化的,通过与文本的反复联系来增强它的意义。
In the first experiment, Thompson and Russo found that the musical context influenced the perceived affective valence of lyrics. For example, the lyrics of Paul Simon’s “Kodachrome,” which are rather wistful, were rated as expressing significantly more positive affect when heard in the context of their musical accompaniment, which is quite upbeat.14 In their second study, they found that the lyrics of familiar songs were judged as more meaningful when accompanied by music, but (somewhat surprisingly) the lyrics of unfamiliar songs were not. In the final study, they found that repeated background exposure to songs led to higher ratings of the meaningfulness of the associated lyrics. Together, the results of experiments 2 and 3 suggest that mere familiarity with music leads listeners to believe that music enhances the meaning of lyrics. It is as if music is semiotically protean, and via repeated association with a text comes to enhance its meaning.
上述所有研究都通过相当粗糙的操作来检验音乐和语言意义之间的关系:音乐的存在或不存在。借鉴音乐理论家的见解,未来的研究可能更有趣的是操纵与给定文本相关的音乐结构,并询问听众是否对语言和音乐符号的关系表现出任何敏感性(例如,通过对意义的判断歌词,使用 Thompson 和 Russo 采用的类型测量)。例如,可以将巴赫的合唱曲“Aus tiefer Noth schrei ich zu dir”用不同的和声来呈现,充满决定性的抑扬顿挫,或者将“Kodachrome”的歌词设置为更阴沉的音乐,这样就没有了忧伤和忧伤之间的张力。歌词和欢快的伴奏。
All of the above studies examine the relationship between musical and linguistic meaning via a rather gross manipulation: the presence or absence of music. Drawing on the insights of music theorists, it may be more interesting for future studies to manipulate the musical structure associated with a given text and to ask if listeners show any sensitivity to the relation of linguistic and musical semiosis (e.g., via judgments of the meaningfulness of the lyrics, using measures of type employed by Thompson and Russo). For example, one could present the Bach chorale “Aus tiefer Noth schrei ich zu dir” with a different harmonization, full of conclusive cadences, or to set the lyrics of “Kodachrome” to more somber music, so that it lacks the tension between wistful lyrics and an upbeat accompaniment. If listeners who were not familiar with the original songs judge the lyrics as more meaningful in the original (vs. altered) contexts, this would provide evidence that the details of musical structure are contributing to the meaningfulness of the song.
在本章讨论的音乐意义和语言意义之间的不同联系点中,有一个特别有前途。这是影响音乐和声音提示之间的联系(首先在第 6.2.2 节中讨论)). 可能存在这种联系的想法由来已久。早在柏拉图时期,哲学家和理论家就推测音乐的部分表现力在于与情感声音相关的声学线索(Kivy,2002)。这种观点是明智的,因为声音影响与语音的“音乐”方面如音调、速度、响度和音色之间存在关系(语音质量;Ladd 等人,1985 年;Johnstone 和 Scherer,1999 年;2000 年)。多年来,音乐研究人员就音乐和声乐影响之间的相似性提出了许多建议。例如,在一项摇篮曲的跨文化研究中,包括非洲、美洲原住民、萨摩亚和乌克兰的材料,Unyk 等人。(1992) 研究了哪些结构特征最能预测西方成年人判断一首来自另一种文化的歌曲是否是摇篮曲的能力。研究人员发现,摇篮曲判断的准确性可以通过旋律中下降音程的百分比来预测。他们将这一事实与 Fernald (1992) 的观察联系起来,即在用于安抚婴儿的面向婴儿的语音中,降序音调轮廓占主导地位(而在用于唤醒婴儿的面向婴儿的语音中,升序轮廓占主导地位)。科恩 (1971) 提出了音乐和声乐影响之间的另一个有趣的联系,他专注于文艺复兴时期作曲家帕莱斯特里纳 (1525/6–1594) 声乐中的对位规则。科恩认为,这些规则中的许多都是为了抑制音量、音高或节奏的突然变化,从而使音乐类似于“不激动的演讲”的韵律。她指出,“这些规则服务于一种平静、虔诚的理想表达,这满足了特伦特会议引入的改革:所有仪式中的声音表达“可以平静地进入听到它们的人的耳朵和心中……”(第 109 页)。
Of the different points of contact between musical and linguistic meaning discussed in this chapter, one stands out as particularly promising. This is the link between musical and vocal cues to affect (first discussed in section 6.2.2). The idea that there may be such a link has a long history. Philosophers and theorists as far back as Plato have speculated that part of music’s expressive power lies in acoustic cues related to the sounds of emotive voices (Kivy, 2002). Such a view is sensible because there is a relationship between vocal affect and “musical” aspects of speech such as pitch, tempo, loudness, and timbre (voice quality; Ladd et al., 1985; Johnstone & Scherer, 1999; 2000). Over the years there have been a number of suggestions by music researchers about parallels between musical and vocal affect. For example, in a cross-cultural study of lullabies, including African, Native American, Samoan, and Ukrainian material, Unyk et al. (1992) examined which structural features best predicted Western adults’ ability to judge whether a song from another culture was a lullaby. The researchers found that accuracy of lullaby judgments was predicted by the percentage of descending intervals in the melodies. They related this fact to Fernald’s (1992) observation that descending pitch contours dominate in infant-directed speech used to soothe infants (whereas ascending contours dominate in infant-directed speech used to arouse infants). Another interesting link between musical and vocal affect was proposed by Cohen (1971), who focused on the rules of counterpoint in the vocal music of the Renaissance composer Palestrina (1525/6–1594). Cohen argues that many of these rules act to suppress sudden changes in volume, pitch, or rhythm, thus making the music similar to the prosody of “unexcited speech.” She notes, “The rules served an ideal of calm, religious expression, which satisfied the reforms introduced by the Council of Trent: that the vocal expressions in all ceremonies ‘may reach tranquilly into the ears and hearts of those who hear them …’” (p. 109).
Unyk 等人的观察结果。和 Cohen 发人深省,并呼吁对声学线索与声音和音乐影响之间的关系进行实证研究。幸运的是,这一领域的研究激增,如下所述。
Observations such as those of Unyk et al. and Cohen are thought-provoking and call for empirical research on the relation between acoustic cues to vocal and musical affect. Fortunately there has been a surge of research in this area, as discussed below.
感知研究表明,听众善于从声音中解码基本情绪(如快乐、悲伤、愤怒和恐惧),即使所说的词语在情感上是中性的或在语义上难以理解,如外语演讲(Johnstone & Scherer 2000;Scherer 等人,2001;Thompson & Balkwill,2006)。15这表明语音中不同情绪的声学线索存在跨文化共性,这可能反映了情绪对发声器官的生理影响,正如赫伯特·斯宾塞 (Herbert Spencer) (1857) 首次提出的那样。斯宾塞还认为,这些线索在音乐表达中发挥了作用,其依据是歌曲采用了语音中使用的情感线索的强化版本。自 Spencer 时代以来,几位现代研究人员已经探索了语音和音乐中的情感线索之间的平行思想(例如,Sundberg,1982 年;Scherer,1995 年)。
Perceptual research has revealed that listeners are good at decoding basic emotions (such as happiness, sadness, anger, and fear) from the sound of a voice, even when the words spoken are emotionally neutral or semantically unintelligible, as in speech in a foreign language (Johnstone & Scherer 2000; Scherer et al., 2001; Thompson & Balkwill, 2006).15 This suggests cross-cultural commonalities in the acoustic cues to different emotions in speech, perhaps reflecting the physiological effects of emotion on the vocal apparatus, as first suggested by Herbert Spencer (1857). Spencer also argued that these cues played a role in musical expression, based on the idea that song employs intensified versions of the affective cues used in speech. Since Spencer’s time, the idea of a parallel between affective cues in speech and music has been explored by several modern researchers (e.g., Sundberg, 1982; Scherer, 1995).
在一项具有里程碑意义的研究中,Juslin 和 Laukka(2003 年)对 104 项声乐表达研究和 41 项音乐表演研究进行了全面回顾(这些研究中约有一半关注声乐,另一半研究器乐)。他们发现,在这两个领域中,当所描绘的情绪仅限于五个基本类别时,听众能够相当准确地判断演讲者或表演者说出/表演给定口头/音乐段落的意图:快乐、悲伤、愤怒、恐惧、和柔情。这自然而然地提出了一个问题,即听众在判断情感韵律与音乐表演中的情感时所使用的声学线索之间的关系。Juslin 和 Laukka 发现用于在语言和音乐中传达基本情感的声学线索有很大的重叠。表 6.2。
In a landmark study, Juslin and Laukka (2003) conducted a comprehensive review of 104 studies of vocal expression and 41 studies of music performance (about half of these studies focused on vocal music, and the other half on instrumental music). They found that in both domains, listeners were fairly accurate in judging the emotion intended by a speaker or performer uttering/performing a given spoken/musical passage, when the emotions portrayed were limited to five basic categories: happiness, sadness, anger, fear, and tenderness. This naturally raises the question of the relationship between the acoustic cues used by listeners in judging emotion in emotional prosody versus music performance. Juslin and Laukka found substantial overlap in the acoustic cues used to convey basic emotions in speech and music. Some of the cross-modal similarities are listed in Table 6.2.
T$able 6.2 Shared Acoustic Cues for Emotions in Speech and Music
Juslin 和 Laukka 指出,在这两个领域中,线索都是以概率和连续的方式使用的,因此线索不是完全可靠的,而是必须结合使用。此外,有证据表明,这些提示以加法方式组合(几乎没有提示交互),并且在音乐表达中存在一定数量的“提示交易”,反映了特定乐器的紧急情况。(例如,如果表演者不能通过改变音色来表达愤怒,他/她可以通过稍微改变响度来补偿。)
Juslin and Laukka note that in both domains, cues are used probabilistically and continuously, so that cues are not perfectly reliable but have to be combined. Furthermore, evidence suggests that the cues are combined in an additive fashion (with little cue interaction), and that there is a certain amount of “cue trading” in musical expression reflecting the exigencies of particular instruments. (For example, if a performer cannot vary timbre to express anger, s/he compensates by varying loudness a bit more.)
鉴于语音和音乐之间的这些相似性,Juslin 和 Laukka 提出了一个有趣的假设,即许多乐器被大脑处理为“超级表现力的声音”。也就是说,即使从现象学的角度来看,大多数乐器听起来不像声音,但它们仍然可以激活大脑中的情绪感知模块,因为它们包含足够的类似语音的声学特征来触发这些模块。根据这种观点,“情绪感知模块无法识别声音表达和其他声音表达之间的差异,因此只要出现某些提示(例如,高速、响亮的动态、粗糙的音色)存在于刺激中”(第 803 页)。这个观点很有趣,因为它导致对神经分离的具体预测,这将在本节后面讨论。
Given these similarities between speech and music, Juslin and Laukka put forth the interesting hypothesis that many musical instruments are processed by the brain as “superexpressive voices.” That is, even though most musical instruments do not sound like voices from a phenomenological standpoint, they can nevertheless engage emotion perception modules in the brain because they contain enough speech-like acoustic features to trigger these modules. According to this view, “the emotion perception modules do not recognize the difference between vocal expressions and other acoustic expressions and therefore react in much the same way (e.g., registering anger) as long as certain cues (e.g., high speed, loud dynamics, rough timbre) are present in the stimulus” (p. 803). This perspective is interesting because it leads to specific predictions about neural dissociations, discussed later in this section.
对 Juslin 和 Laukka 审查的大多数研究的一个批评是,他们的音乐环境相当人为,由一位音乐家组成,他被指示演奏给定的音乐段落,目的是传达不同的情绪,如快乐、悲伤、愤怒,等等。有人可能会反对说,这样的研究与真实音乐中的情感关系不大,只是引导表演者模仿他们最熟悉的情感的任何声学线索(例如,来自语音的线索)。因此,在未来的工作中,最好将分析限制在更自然的音乐刺激上,例如 Krumhansl 在她对音乐情感的心理生理学研究中使用的刺激(参见第 6.2.3 节)). 另一个重要的数据来源将来自比较民族音乐学,尤其是来自一种文化的听众试图从另一种文化中摘录的自然但不熟悉的音乐片段中识别情绪的研究。当跨文化识别成功时,可以检查声学线索以确定是否有任何线索与情感语音韵律中的线索相关(参见第 6.2.2 节中讨论的 Balkwill 及其同事的研究)。
One criticism of the majority of studies reviewed by Juslin and Laukka is that their musical circumstances were rather artificial, consisting of a single musician who has been instructed to play a given musical passage with the intent to convey different emotions such as happiness, sadness, anger, and so forth. One might object that such studies have little to do with emotions in real music, and simply lead a performer to imitate whatever acoustic cues to the emotion are most familiar to them (e.g., cues from speech). Thus, in future work, it may be preferable to restrict analyses to more naturalistic musical stimuli, such as those employed by Krumhansl in her psychophysiological study of musical emotion (cf. section 6.2.3). Another important source of data would be from comparative ethnomusicology, especially from studies in which listeners from one culture attempt to identify the emotions in naturalistic but unfamiliar musical excerpts taken from another culture. When cross-cultural identification is successful, then acoustic cues can be examined to determine if any cues are related to the cues in affective speech prosody (cf. the research of Balkwill and colleagues discussed in section 6.2.2).
测试音乐是否涉及通常用于解码语言情感韵律的大脑机制的一种方法 (Juslin & Laukka, 2003) 是检查语言和音乐中情感感知的神经关系。这个问题可以通过两种方式来解决。首先,人们可以使用功能性神经影像学检查健康大脑的活动模式。对情绪韵律感知的研究涉及右半球的区域,包括右下额叶区域(George 等人,1996 年;Imaizumi 等人,1997 年;Buchanan 等人,2000 年)。然而,定位结果存在一些不一致,并且有证据表明涉及许多大脑区域(皮质和皮质下)(Peretz,2001)。还有一些关于音乐中情感感知的神经相关性的研究(例如,Blood 等人,1999 年;Schmidt 和 Trainor,2001年;比照。Trainor & Schmidt, 2003),尽管结果在定位和半球不对称方面再次出现变化(参见 Peretz, 2001 的综述)。
One way to test whether music engages brain mechanisms normally used to decode affective prosody in language (Juslin & Laukka, 2003) is to examine the neural relationship of affect perception in speech and music. This issue can be approached in two ways. First, one can use functional neuroimaging to examine patterns of activity in healthy brains. Research on the perception of emotional prosody has implicated regions in the right hemisphere including right inferior frontal regions (George et al., 1996; Imaizumi et al., 1997; Buchanan et al., 2000). There is some inconsistency in the localization results, however, and evidence that many brain regions (both cortical and subcortical) are involved (Peretz, 2001). There are also a few studies of the neural correlates of affect perception in music (e.g., Blood et al., 1999; Schmidt & Trainor, 2001; cf. Trainor & Schmidt, 2003), although once again the results have been variable in terms of localization and hemispheric asymmetry (see Peretz, 2001, for a review).
这种可变性的部分原因可能是方法上的差异。例如,Blood 等人。(1999)。根据不和谐程度的变化,检查了大脑对音乐愉悦或不愉快的感知相关性,发现大部分激活发生在右半球(例如,右侧海马旁回和楔前叶,尽管有双侧眶额皮质)。相比之下,施密特和特雷纳 (Schmidt and Trainor) (2001) 使用音乐表达快乐、幸福、悲伤或恐惧,并发现大脑反应中的半球不对称(通过脑电图测量),积极情绪的左额叶活动更多,消极情绪的右额叶活动更多。因此,目前还不能确定与声音和音乐影响感知相关的大脑区域是否重叠。然而,这个问题非常适合实证分析,使用受试者内部设计以及旨在表达相似基本情绪的语言和音乐刺激。
Some of this variability may be due to differences in methodology. For example, Blood et al. (1999). examined brain correlates of the perceived pleasantness or unpleasantness of music based on changing degrees of dissonance and found that most activation was in the right hemisphere (e.g., in right parahippocampal gyrus and precuneus, though there was bilateral orbitofrontal cortex). In contrast, Schmidt and Trainor (2001) used music expressing joy, happiness, sadness, or fear and found hemispheric asymmetries in brain responses (as measured by EEG), with greater left frontal activity for positive emotions and greater right frontal activity for negative emotions. Thus at this time, one cannot state with any confidence whether the brain regions associated with vocal and musical affect perception overlap or not. However, the question is quite amenable to empirical analysis, using a within-subjects design and linguistic and musical stimuli designed to be expressive of similar basic emotions.
除了神经影像学,还有另一种方法可以研究口语与音乐影响的神经关系。这是为了关注那些在脑损伤后难以判断言语情感质量的人。接受性“情感性厌食症”综合症为人所知已有一段时间了(Ross,1981)。尽管其他语言能力相对保留,但由于脑损伤,这些人难以辨别言语(有时是面部)的情绪(Ross,2000)。没有出现这种疾病的单一脑位点:似乎两个半球都参与识别情绪韵律,尽管右下额叶区域似乎特别重要(Adolphs 等人,2002 年;Charbonneau 等人,2002 年)。迄今为止,没有关于这些人感知音乐中情感表达的能力的研究发表过。显然需要进行此类研究,因为它们将为 Juslin 和 Laukka 的假设提供重要检验,即大脑将乐器视为“超强表现力的声音”。
Besides neuroimaging, there is another approach to the study of the neural relationship of spoken versus musical affect. This is to focus on individuals who have difficulty judging the affective quality of speech following brain damage. The syndrome of receptive “affective aprosodia” has been known for some time (Ross, 1981). Such individuals have difficulty discriminating emotion in speech (and sometimes in faces) due to brain damage, despite relative preservation of other linguistic abilities (Ross, 2000). No single brain locus of this disorder has emerged: It appears that both hemispheres are involved in recognizing emotional prosody, though right inferior frontal regions appear to be particularly important (Adolphs et al., 2002; Charbonneau et al., 2002). To date, no studies have been published of these individuals’ ability to perceive emotional expression in music. Such studies are clearly called for, because they would provide an important test of the Juslin and Laukka’s hypothesis that the brain treats instruments as “superexpressive voices.”
另一种测试音乐和语音是否参与影响感知的共同机制的方法是检查从一个域到另一个域的转移效应。汤普森等人。(2004) 就是这样做的,通过研究音乐训练是否提高了区分声音表达的不同情绪的能力。乍一看,这种便利似乎不太可能。区分声音情绪的能力似乎是一种基本的生物学功能,对于生存来说非常重要,以至于进化将使大脑以最少的经验学会这种区分,因此音乐或任何其他领域的训练对表现几乎没有影响。另一方面,如果言语情感的辨别能力与情感声音的体验量有关,6.5.1 节),那么人们可能会认为音乐训练会提高声音情感辨别能力。在 Thompson 等人的研究中,受过音乐训练和未受过训练的说英语的成年人用他们自己的语言或外语(他加禄语,菲律宾的一种语言)听情绪化的句子,并试图将它们分为四个基本类别之一:快乐、悲伤、愤怒和恐惧。研究人员发现,音乐训练确实提高了识别某些情绪的能力,尤其是悲伤和恐惧。特别值得注意的是训练对用他加禄语区分情绪的能力的影响:这是第一个证据表明音乐训练可以增强对外语中声音情绪的敏感性。
Another way to test if music and speech engage common mechanisms for affect perception is to examine transfer effects from one domain to the other. Thompson et al. (2004) have done just this, by studying whether musical training improves the ability to discriminate between different emotions expressed by the voice. At first glance, such a facilitation seems unlikely. The ability to discriminate vocal emotions would seem to be a basic biological function, important enough to survival that evolution would make the brain learn this discrimination with minimum experience, so that training in music or any other domain would make little difference to performance. On the other hand, if discrimination abilities for speech affect are related to the amount of experience with emotional voices, and if there are important similarities in acoustic cues to emotion in speech and music (as suggested in section 6.5.1), then one might expect that musical training would improve vocal affective discrimination skills. In Thompson et al.’s study, musically trained and untrained English-speaking adults listened to emotional sentences in their own language or in a foreign language (Tagalog, a language of the Philippines) and attempted to classify them into one of four basic categories: happiness, sadness, anger, and fear. The researchers found that musical training did improve the ability to identify certain emotions, particularly sadness and fear. Particularly notable was an effect of training on the ability to discriminate emotions in Tagalog: This was the first evidence that musical training enhances sensitivity to vocal emotions in a foreign language.
当然,寻求音乐训练的人可能已经对一般的声音影响更加敏感。为了解决这个“因果方向性”问题,Thompson 等人。(2004) 进行了另一项实验,其中 6 岁的儿童被分配到参加 1 年音乐课(键盘或歌唱)、戏剧课或不上课的小组。在开始上课之前,孩子们在一系列不同的心理测试中的智商和学业成绩相当。在年底,孩子们接受了区分声音影响的能力测试,测试使用的是一种稍微简化的程序,他们要么必须区分快乐和悲伤,要么区分愤怒和害怕。研究人员发现,不同群体对快乐/悲伤声音影响的区分非常好,以至于无法观察到差异(“天花板”效应)。然而,恐惧/愤怒歧视表现水平较低,并且确实出现了显着的群体差异。具体来说,上过键盘课或戏剧课的孩子比没有上过课的孩子表现要好得多。学习键盘的孩子和学习戏剧的孩子一样好(声音影响是训练的明确目标)特别有趣,汤普森等人。表明这种传递效应可能反映了解码音乐和语音韵律中情感意义的重叠过程(特别是将音高和时间模式与情感相关联的过程)。恐惧/愤怒歧视表现水平较低,这里确实出现了显着的群体差异。具体来说,上过键盘课或戏剧课的孩子比没有上过课的孩子表现要好得多。学习键盘的孩子和学习戏剧的孩子一样好(声音影响是训练的明确目标)特别有趣,汤普森等人。表明这种传递效应可能反映了解码音乐和语音韵律中情感意义的重叠过程(特别是将音高和时间模式与情感相关联的过程)。恐惧/愤怒歧视表现水平较低,这里确实出现了显着的群体差异。具体来说,上过键盘课或戏剧课的孩子比没有上过课的孩子表现要好得多。学习键盘的孩子和学习戏剧的孩子一样好(声音影响是训练的明确目标)特别有趣,汤普森等人。表明这种传递效应可能反映了解码音乐和语音韵律中情感意义的重叠过程(特别是将音高和时间模式与情感相关联的过程)。学习键盘的孩子和学习戏剧的孩子一样好(声音影响是训练的明确目标)特别有趣,汤普森等人。表明这种传递效应可能反映了解码音乐和语音韵律中情感意义的重叠过程(特别是将音高和时间模式与情感相关联的过程)。学习键盘的孩子和学习戏剧的孩子一样好(声音影响是训练的明确目标)特别有趣,汤普森等人。表明这种传递效应可能反映了解码音乐和语音韵律中情感意义的重叠过程(特别是将音高和时间模式与情感相关联的过程)。16
Of course, it is possible that individuals who seek out musical training are already more sensitive to vocal affect in general. As an attempt to get around this “causal directionality” problem, Thompson et al. (2004) conducted another experiment in which 6-year-old children were assigned to groups that took 1 year of music lessons (keyboard or singing), drama lessons, or no lessons. Prior to starting the lessons, the children were equivalent in terms of IQ and academic achievement on a number of different psychological tests. At the end of the year, the children were tested for their ability to discriminate vocal affect, using a slightly simplified procedure in which they either had to discriminate between happy and sad or between angry and frightened. The researchers found that discrimination of happy/sad vocal affect was so good across groups that no differences could be observed (a “ceiling” effect). However, the fear/anger discrimination performance levels were lower, and here significant group differences did emerge. Specifically, children who had keyboard lessons or drama lessons did significantly better than children with no lessons. That children who studied keyboard did as well as children who studied drama (in which vocal affect is an explicit target of training) is particularly interesting, and Thompson et al. suggest that this transfer effect may reflect overlapping processes in decoding emotional meaning in music and speech prosody (specifically, processes that associate pitch and temporal patterns with emotion).16
前面的三个部分表明,音乐和言语中情绪表达(和感知)的实证比较研究有了一个良好的开端,并且有有趣的探索途径,例如,情感性厌食症患者对音乐情感的感知。本节涉及一些可能对未来工作很重要的问题。其中一个问题是用于描述由语音和音乐产生的情绪的维度数量。如第 6.2.3 节所述,心理学研究通常根据两个不同的维度来描述情绪,一个对应于效价(积极与消极),一个对应于活动(低与高)。可能是二维太少了。事实上,在一项比较声学线索对音乐和言语影响的研究中,Ilie 和 Thompson (2006) 需要三个维度来捕捉他们数据中的模式,而 Laukka 等人对声音影响的研究。(2005) 使用了四个维度。Scherer (2004) 也认为二维描述音乐情感过于简单,可能是高维情感感受性的低维投射。
The preceding three sections show that the empirical comparative study of emotional expression (and perception) in music and speech is off to a good start and has interesting avenues to explore, for example, the perception of musical affect in patients with affective aprosodia. This section touches on some issues likely to prove important to future work. One of these concerns is the number of dimensions used to describe emotions engendered by speech and music. As discussed in section 6.2.3, psychological studies often describe emotions in terms of two distinct dimensions, one corresponding to valence (positive vs. negative) and one to activity (low vs. high). It may be that two dimensions are too few. Indeed, in a study comparing acoustic cues to affect in music and speech, Ilie and Thompson (2006) needed three dimensions to capture patterns in their data, whereas a study of vocal affect by Laukka et al. (2005) used four dimensions. Scherer (2004) also feels that two-dimensional descriptions of musical emotion are too simple, and may be low-dimensional projections of high-dimensional emotional qualia.
未来工作的第二个问题涉及声学线索映射到语音和音乐中的情绪的方式的差异。比较语音音乐研究主要强调跨领域的相似性,但存在差异并且可能证明对认知问题有帮助。例如,Ilie 和 Thompson (2006) 发现音调高度的操纵对语音和音乐的感知效价有相反的影响:较高音调的语音(但较低音调的音乐)与更积极的效价相关。
A second issue for future work concerns differences in the way acoustic cues map onto emotions in speech and music. Comparative speech-music research has mostly emphasized cross-domain similarities, but differences exist and are likely to prove informative for cognitive issues. For example, Ilie and Thompson (2006) found that manipulations of pitch height had opposite effects on the perceived valence of speech and music: Higher pitched speech (but lower pitched music) was associated with more positive valence.
最后一个问题涉及音色。尽管很多关于语音和音乐影响的研究都集中在速度、强度和音调轮廓上,但音色(或语音中的“语音质量”)在传达影响方面的作用却鲜有探索,也许是因为它更难衡量。然而,音色对于情感表达和感知可能至关重要(参见 Ladd 等,1985)。例如,请注意表 6.2中愤怒和快乐的相似程度在与速率、强度和音高模式相关的声学提示方面。因此,音色线索对于区分这些情绪可能非常重要。毫无疑问,音色在传达音乐情感方面也很重要(参见 Bigand 等人,2005 年)。(一个非正式的演示是通过聆听管弦乐队演奏的音乐主题及其所有伴随的音色和钢琴独奏演奏来提供的。这两个段落包含相同的节奏和旋律,但可以有非常不同的情感影响。不同之处在于音色。)对口语与器乐音乐感知(使用 fMRI)的单独脑成像研究表明,在人声与音乐音色分析中,颞上沟的重叠区域(Belin 等人,2000 年;Menon等人,2002 年,参见 Chartrand & Belin,2006). 然而,这些区域似乎并不完全重叠,这与神经学病例中音乐(与声音)音色的感知可能会选择性受损的观察结果一致(参见 Sacks,2007)。因此,可能首先通过专门的大脑区域分析人声与音乐声音的音色,然后根据映射到情感品质的共同电路处理由此产生的音色信息。
A final issue concerns timbre. Although much research on affect in speech and music has focused on tempo, intensity, and pitch contours, the role of timbre (or “voice quality” in speech) in conveying affect has been less explored, perhaps because it is harder to measure. Yet timbre may be crucial for affect expression and perception (cf. Ladd et al., 1985). Note, for example, how similar anger and happiness are in Table 6.2 in terms of acoustic cues relating to rate, intensity, and pitch patterns. Timbral cues are thus likely to be very important in distinguishing these emotions. There can be little doubt that timbre is also important in conveying affect in music (cf. Bigand et al., 2005). (An informal demonstration is provided by listening to a musical theme as performed by an orchestra, with all of its attendant sound color, and as performed by a solo piano. These two passages contain the same rhythm and melody, but can have a very different affective impact. The difference is in the timbre.) Separate brain imaging studies of spoken voice versus instrumental music perception (using fMRI) have implicated overlapping regions in the superior temporal sulcus in vocal versus musical timbre analysis (Belin et al., 2000; Menon et al., 2002, cf. Chartrand & Belin, 2006). However, these regions appear not to overlap completely, consistent with the observation that the perception of musical (vs. voice) timbre can be selectively impaired in neurological cases (cf. Sacks, 2007). Hence it may be that vocal versus musical sounds are first analyzed for timbre by specialized brain regions, and that the resulting timbral information is then handled by common circuits in terms of mapping onto emotional qualities.
乍一看,似乎语言和音乐意义在很大程度上是不相称的。事实上,如果将“意义”限制为语义指称和谓词,那么音乐和语言几乎没有共同点(尽管最近的认知和神经研究表明,可能比怀疑更多)。这里采用的方法是采用更广泛的意义观点,灵感来自 Nattiez 的音乐符号学研究 (1990)。在这种观点中,当对一个物体/事件的感知使我们想起物体/事件本身以外的东西时,意义就存在了。这个定义激发了对各种方式的系统思考其中音乐可以是有意义的,这反过来又改进了关于音乐和语言意义如何相似或不同的讨论。从这个角度来看,许多有趣的跨领域研究课题就会浮出水面,包括情感的表达和评价、使语言或音乐话语连贯的认知关系,以及歌曲中语言和音乐意义的结合。对音乐和语言的比较研究可以帮助阐明我们的思维从结构化的声音序列中获取意义的方式的多样性。
At first glance, it seems that linguistic and musical meaning are largely incommensurate. Indeed, if one restricts “meaning” to semantic reference and predication, then music and language have little in common (though perhaps more than suspected, as suggested by recent cognitive and neural research). The approach taken here is to adopt a broader view of meaning, inspired by Nattiez’s semiotic research on music (1990). In this view, meaning exists when perception of an object/event brings something to mind other than the object/event itself. This definition stimulates systematic thinking about the variety of ways in which music can be meaningful, which in turn refines the discussion of how musical and linguistic meaning are similar or different. When one takes this perspective, many interesting topics for cross-domain research come to the fore, including the expression and appraisal of emotion, the cognitive relations that make a linguistic or musical discourse coherent, and the combination of linguistic and musical meaning in song. Comparative research on music and language can help illuminate the diversity of ways in which our mind derives meaning from structured acoustic sequences.
1应该指出的是,Hanslick (1854) 早在 100 多年前就在《音乐之美》中发表了类似的观察:它是一种我们说和理解的语言,但我们无法翻译”(第 50 页)。
1 It should be noted that Hanslick (1854) made a similar observation more than 100 years earlier, in The Beautiful in Music: “In music there is both meaning and logical sequence, but in a musical sense; it is a language we speak and understand, but which we are unable to translate” (p. 50).
2我说的是普通的日常语言,而不是诗歌等等。对于普通语言,关键是任何自然语言 X 都可以很好地翻译成任何其他自然语言 Y,足以传达 X 中句子的要旨。当然,每种语言在其语法和词汇中包含不同的看待方式在世界上 (Genter & Goldin-Meadow, 2003),因此没有翻译是完美的。
2 I am speaking of ordinary day-to-day language, not poetry, and so forth. For ordinary language, the key point is that any natural language X can be translated into any other natural language Y well enough to convey the gist of the sentences in X. Of course, each language contains within its grammar and vocabulary a different way of looking at the world (Genter & Goldin-Meadow, 2003), so that no translation is ever perfect.
3除了我自己的经验之外,我没有任何经验数据来支持这一说法,但我怀疑我是独一无二的。有关印度儿童在最初的厌恶反应后开始欣赏西方古典音乐的非正式描述,请参见 Holst (1962)。
3 I have no empirical data to support this claim other than my own experience, but I doubt that I am unique. See Holst (1962) for an informal description of Indian children coming to appreciate Western classical music after an initial aversive reaction.
4人们可以在至少一种类型的非语言序列中长时间感知结构连贯性,即在没有文字的电影中。然而,这反映了故事的感知连贯性,它具有很强的语义成分。
4 People can perceive structural coherence in at least one type of nonlinguistic sequence over long timescales, namely in films without words. However, this reflects the perceived coherence of the story, which has a strong semantic component.
5事实上,Cooke 的一些著作很自然地引出了具体而有趣的实证假设(尤其参见 1959 年:第 4 章)。
5 Indeed, some of Cooke’s writing leads quite naturally into specific and interesting empirical hypotheses (see especially 1959: Ch. 4).
6基于广泛的第一手跨文化研究,贝克尔(2001 年,2004 年)认为,在聆听音乐时体验到的确切情绪似乎因文化而异,受社会和背景力量的影响。虽然承认这一点,但我在此重点关注可以说出现在每个人类社会中的基本情绪,尽管它们可能会受到文化影响。
6 Based on extensive firsthand cross-cultural research, Becker (2001,2004) argues that the precise emotions experienced during musical listening appear to vary from culture to culture, being subject to social and contextual forces. Although acknowledging this point, I focus here on basic emotions that arguably appear in every human society, though they may be culturally inflected.
7虽然我在这里引用了 20 世纪的思想家,但音乐特定情感的概念可以追溯到更早以前。例如,赫伯特·斯宾塞 (Herbert Spencer, 1857) 认为“音乐不仅……强烈地激发了我们更熟悉的感觉,而且还产生了我们以前从未有过的感觉——唤起我们没有想到的可能性并且不知道其含义的休眠情绪” (第 404 页)。
7 Although I have cited 20th-century thinkers here, the idea of music-specific emotions dates further back. Herbert Spencer (1857), for example, argued that “music not only … strongly excites our more familiar feelings, but also produces feelings we never had before—arouses dormant sentiments of which we had not conceived the possibility and do not know the meaning” (p. 404).
8 Sloboda (1991) 还发现,眼泪与(除其他外)从五度和弦循环到主音的和声运动有关。这很有趣,因为它是与预期确认相关的情绪反应的一个例子
8 Sloboda (1991) also found that tears were associated with (among other things) harmonic movements through the chordal circle of fifths to the tonic. This is interesting because it is an example of an emotional response associated with a confirmation of expectancy.
9即使发冷被认为是一种唤醒形式,与日常情绪相比,它也是一种不寻常的唤醒形式,因为它们是如此短暂。
9 Even if chills are considered a form of arousal, it is an unusual form of arousal compared to everyday emotions, because they are so transient.
10有时人们会说“他/她的声音让我不寒而栗”,但我怀疑他们的意思是他们发现某个特定的声音总体上令人不安。如果他们实际上指的是这里讨论的那种寒战,那很可能是由于与特定人的情感联系,而不是对声音本身声音结构的反应。事实上,正如 Goldstein (1980) 所指出的,当音乐让听众想起他们过去的一个情绪激动的事件或人时,“通过联想引起的寒意”也可能发生在音乐中。
10 Sometimes people say “his/her voice gives me the chills,” but I suspect that by this they mean that they find a particular voice unsettling in a general way. If they are in fact referring to the kind of chills under discussion here, it is most likely to be due to emotional associations with a particular person rather than a response to the sound structure of the voice per se. Indeed, “chills via association” can occur in music as well, as noted by Goldstein (1980), when music reminds a listener of an emotionally charged event or person from their past.
11关于语言结构如何影响世界上非语言现象的感知,有一些有趣的研究(Genter & Goldin-Meadow, 2003; Casasanto, in press),但这是一个不同的话题。据我所知,只有一个案例表明一种文化的语言句法系统反映了一种特定的文化信仰体系(Everett,2005)。
11 There are interesting studies of how language structure can influence the perception of nonlinguistic phenomena in the world (Genter & Goldin-Meadow, 2003; Casasanto, in press), but this is a different topic. I am aware of only one case in which it has been argued that a culture’s linguistic syntactic system reflects a particular cultural belief system (Everett, 2005).
12请注意,此上下文可以只有一个单词。也就是说,如果单词成对出现,并且操纵它们之间的语义相关度,则将观察到与其上下文单词在语义上不太相关的单词的 N400 效应。因此,N400 效应的引出不需要语言命题。
12 Note that this context can be as little as a single word. That is, if words are presented in pairs, and the degree of semantic relatedness between them is manipulated, an N400 effect will be observed to words that are semantically less related to their context words. Thus elicitation of the N400 effect does not require linguistic propositions.
13可以在http://www.stefan-koelsch.de/通过指向 Koelsch 等人的链接听到刺激。2004 年的文章。
13 The stimuli can be heard at http://www.stefan-koelsch.de/ by following the links to the Koelsch et al. 2004 article.
14我碰巧熟悉这首歌,既是听众又是表演者,我认为它的部分情感力量来自渴望的歌词和欢快的音乐之间的对比,它抓住了复杂的、多价的情感与青春记忆有关。
14 I happen to be familiar with this song, both as a listener and a performer, and I would argue that part of its affective power comes from the contrast between the wistful lyrics and the bright music, which captures something about the complex, polyvalent affect associated with memory of one’s youth.
15 Thompson 和 Balkwill (2006) 的研究以包括非西方语言而著称。英语听众能够在五种语言(包括日语和他加禄语)中以高于机会水平的语义中性句子识别快乐、悲伤、愤怒和恐惧,尽管他们在英语句子中表现出组内优势,在日语中识别率较低和中文句子。这表明声音影响的细微文化差异值得进一步研究(参见 Elfenbein & Ambady,2003)。
15 Thompson and Balkwill’s (2006) study is notable for including non-Western languages. English listeners were able to identify happiness, sadness, anger, and fear in semantically neutral sentences at above-chance levels in five languages (including Japanese and Tagalog), though they showed an in-group advantage for English sentences and lower recognition rates in Japanese and Chinese sentences. This suggests subtle cultural differences in vocal affect that merit further study (cf. Elfenbein & Ambady, 2003).
16或者,可能是音乐训练增加了对语音音调变化的原始敏感性(参见 Magne 等人,2006 年;Wong 等人,2007 年),这在一定程度上导致了观察到的转移效应。
16 Alternatively, it may be that musical training increases raw sensitivity to pitch variation in speech (cf. Magne et al., 2006; Wong et al., 2007), and that this is partly responsible for the observed transfer effect.
Chapter 7
Evolution
7.1 Introduction
7.2 Language and Natural Selection
7.2.1 Babbling
7.2.2 Anatomy of the Human Vocal Tract
7.2.3 Vocal Learning
7.2.4 Precocious Learning of Linguistic Sound Structure
7.2.5 Critical Periods for Language Acquisition
7.2.6 Commonalities of Structure and Development in Spoken and Signed Language
7.2.7 Robustness of Language Acquisition
7.2.8 Adding Complexity to Impoverished Linguistic Input
7.2.9 Fixation of a Language-Relevant Gene
7.2.10 Biological Cost of Failure to Acquire Language
7.2.11 Other Evidence
7.3 Music and Natural Selection
7.3.1 Adaptationist Hypotheses
Sexual Selection
Mental and Social Development
Social Cohesion
7.3.2 Testing Music Against the Evidence Used for Language Evolution
Rate of Learning of Musical Structure
Critical Period Effects
Robustness of Acquisition of Musical Abilities
Biological Cost of Failure to Acquire Musical Abilities
7.3.3 Infant Studies: Are We “Born Musical”?
Innate Perceptual Predispositions: A Biological Example
Innate Learning Preferences: A Biological Example
Music and Infancy: Questions of Innateness
Music and Infancy: Questions of Specificity
7.3.4 Genetic Studies: What Is the Link Between Genetics and Music?
An Example From Language
Genetics and Musical Tone Deafness
Genetics and Absolute Pitch
7.3.5 Animal Studies: What Aspects of Musicality Are Shared With Other Animals?
Animals, Absolute Pitch, and Relative Pitch
Animals, Consonance, and Dissonance
Animals and Tonality
7.4 Music and Evolution: Neither Adaptation nor Frill
7.5 Beat-Based Rhythm Processing as a Key Research Area
7.5.1 Domain-Specificity
7.5.2 Development
7.5.3 Human-Specificity
7.6 Conclusion
Appendix
从进化的角度来看,语言和音乐是奇特的现象,因为它们只出现在一个物种中:智人。非人类动物最接近语言的研究是侏儒黑猩猩根据与人类的互动学习简单的词汇和句法(Savage-Rumbaugh 等人,1998 年,参见 Herman & Uyeyama,1999 年;Pepperberg,2002 年) . 尽管这些结果令人着迷且重要,但没有证据表明这些动物在野外有基于声音或手势信号的类似语言的交流系统 (Tomasello, 2003)。此外,即使是最早熟的受过语言训练的猿类,可能会掌握几百个单词,也远远超过普通人类儿童,后者在生命的头几年就学会了数千个单词和复杂的语法结构。最后,尽管付出了很多努力,但还没有非人灵长类动物成功地训练出说话的能力(Fitch,2000)。语言,正如我们通常所理解的那样,
From an evolutionary standpoint, language and music are peculiar phenomena, because they appear in only one species: Homo sapiens. The closest that nonhuman animals have come to language has been in studies in which pygmy chimpanzees learn a simple vocabulary and syntax based on interactions with humans (Savage-Rumbaugh et al., 1998, cf. Herman & Uyeyama, 1999; Pepperberg, 2002). Although these results are fascinating and important, these animals show no evidence of a language-like communicative system in the wild, based on either vocal or gestural signals (Tomasello, 2003). Furthermore, even the most precocious language-trained apes, who may acquire a few hundred words, are far surpassed by ordinary human children, who learn thousands of words and complex grammatical structures in the first few years of life. Finally, no nonhuman primate has ever been successfully trained to speak, despite numerous efforts (Fitch, 2000). Language, as we commonly understand the term, is the sole province of humans.
音乐呢?最初,音乐似乎并不是人类独有的,因为许多物种都会产生让我们印象深刻的“歌曲”(Baptista & Keister,2005)。鸣禽和某些鲸鱼(如座头鲸)是著名的歌手,在某些物种中,例如欧洲夜莺(Luscinia megarhynchos),单个歌手可以拥有数百首歌曲,这些歌曲涉及不同序列中离散元素的重组 (Payne, 2000; Slater , 2000). 此外,鸣禽和唱歌的鲸鱼并不是天生就知道它们的歌声,而是像人类一样,通过听大人的声音来学习。
What of music? Initially it may seem that music is not unique to humans, because many species produce “songs” that strike us as musical (Baptista & Keister, 2005). Songbirds and certain whales (e.g., humpbacks) are notable singers, and in some species, such as the European nightingale (Luscinia megarhynchos), an individual singer can have hundreds of songs involving recombination of discrete elements in different sequences (Payne, 2000; Slater, 2000). Furthermore, songbirds and singing whales are not born knowing their song, but like humans, learn by listening to adults.
然而,仔细研究鸟类和鲸鱼歌曲的生物学,揭示了与人类音乐的几个重要区别,这里列出了其中的一些(参见 Cross,2001 年;Hauser & McDermott,2003 年)。首先,歌曲通常由雄性制作,它们用它们来吸引配偶和/或警告竞争的雄性离开领地(Catchpole 和 Slater,1995)。其次,荷尔蒙和神经变化(由光周期刺激)在决定鸟类歌唱的季节性高峰中起着重要作用(例如,Dloniak 和 Deviche,2001 年)。这表明歌曲不是一种自愿的审美行为,而是一种生物介导的生殖行为。第三,虽然许多物种确实会学习它们的歌曲,但似乎对学习有很强的限制。Marler 及其同事的研究表明,与其他物种的歌曲相比,幼鸟更容易、更准确地学习自己物种的歌曲(Marler,1997 年,1999 年;Whaling,2000 年)。当然,人类的音乐学习也有限制。例如,由于认知能力的限制,没有孩子会准确地学习使用每八度音阶 35 个音符的音阶构建的旋律(Miller,1956 年;参见第 2 章)). 对人类音乐学习的限制似乎比对鸟类歌曲学习的限制要弱得多,然而,与任何给定鸟类的歌曲相比,人类音乐的多样性就证明了这一点。最后,或许也是最重要的一点,动物歌曲的结构多样性与意义的多样性并不相关。相反,动物歌曲总是宣传同一套东西,包括交配准备、领地警告和社会地位(Marler,2000;参见本书第 5 章)。因此,英语使用“歌曲”一词来表示某些动物的声学表现这一事实不应误导我们认为动物会像人类一样创作或欣赏音乐。1个
Closer inspection of the biology of bird and whale song, however, reveals several important differences from human music, a few of which are listed here (cf. Cross, 2001; Hauser & McDermott, 2003). First, songs are typically produced by males, who use them to attract mates and/or warn competing males off of territories (Catchpole and Slater, 1995). Second, hormonal and neural changes (stimulated by photoperiod) play an important role in determining seasonal peaks in avian singing (e.g., Dloniak & Deviche, 2001). This suggests that song is not a volitional aesthetic act but a biologically mediated reproductive behavior. Third, although it is true that many species learn their songs, there appear to be strong constraints on learning. Research by Marler and colleagues has shown that juvenile birds learn their own species’ song much more readily and accurately than songs of other species (Marler, 1997, 1999; Whaling, 2000). Of course, there are constraints on human musical learning as well. Due to cognitive capacity limits, for example, no child would accurately learn melodies built from a scale using 35 notes per octave (Miller, 1956; cf. Chapter 2). The constraints on human musical learning appear to be much weaker than on bird song learning, however, as evidenced by the great diversity of human music compared to the songs of any given bird species. Finally, and perhaps most importantly, the structural diversity of animal songs is not associated with an equal diversity of meanings. On the contrary, animal songs always advertise the same set of things, including readiness to mate, territorial warnings, and social status (Marler, 2000; cf. Chapter 5 of this book). Thus the fact that English uses the term “song” for the acoustic displays of certain animals should not mislead us into thinking that animals make or appreciate music in the sense that humans do.1
鉴于我们物种的语言和音乐的普遍性和独特性,很明显,这些能力反映了自大约 600 万年前我们的血统与我们与黑猩猩的共同祖先分化以来大脑中发生的变化 (Carroll, 2003)。因此,可以准确地说,语言和音乐是在人类血统中进化而来的。不过,这种“进化”的意义,从生物学的角度来看,意义不大。生火和控火的能力对于人类文化来说也是普遍而独特的,因此可以说生火是人类血统中“进化”出来的。然而,很少有人会质疑火的控制是基于人类聪明才智的发明,而不是进化力量本身的目标。因此,一个更具生物学意义的问题是,语言的自然选择在多大程度上塑造了人体和大脑?同样,音乐的自然选择在多大程度上塑造了人体和大脑?这些问题是本章的主要焦点。
Given the universality and uniqueness of language and music in our species, it is clear that these abilities reflect changes in the brain that have taken place since our lineage’s divergence from our common ancestor with chimpanzees about 6 million years ago (Carroll, 2003). Thus one can accurately say that language and music evolved in the human lineage. However, this sense of “evolved” is not very meaningful from a biological standpoint. The ability to make and control fire is also universal and unique to human cultures, and one could therefore say that fire making “evolved” in the human lineage. Yet few would dispute that the control of fire was an invention based on human ingenuity, not something that was itself a target of evolutionary forces. Thus a more biologically meaningful question is, to what extent have human bodies and brains been shaped by natural selection for language? Similarly, to what extent have human bodies and brains been shaped by natural selection for music? These questions are the main focus of this chapter.
生火的例子告诉我们,当我们看到一个普遍而独特的人类特征时,我们不能简单地假设它是选择的直接目标。事实上,从科学的角度来看,最好(因为它假设较少)采用原假设,即所讨论的特征不是选择的直接目标。然后可以询问是否有足够的证据来拒绝这一假设。本章的两个主要部分针对语言和音乐解决了这个问题。第 7.2 节认为有足够的证据拒绝这一点假设在语言的情况下,而第 7.3 节对音乐提出了相反的论点。也就是说,第 7.3 节回顾了各种相关研究并得出结论,目前还没有令人信服的证据表明音乐代表了一种进化适应。第 7.4 节问道,“如果音乐不是进化的适应,那么为什么它是普遍的?” 最后一节讨论了一个研究领域,该领域可能对未来关于音乐进化状态的辩论至关重要,即基于节拍的节奏处理。
The example of fire making teaches us that when we see a universal and unique human trait, we cannot simply assume that it has been a direct target of selection. In fact, from a scientific perspective it is better (because it assumes less) to take the null hypothesis that the trait in question has not been a direct target of selection. One can then ask if there is enough evidence to reject this hypothesis. The two main sections of this chapter address this question for language and music. Section 7.2 argues that there is enough evidence to reject this hypothesis in the case of language, whereas section 7.3 argues the opposite for music. That is, section 7.3 reviews a variety of relevant research and concludes that there is as yet no compelling evidence that music represents an evolutionary adaptation. Section 7.4 asks, “If music is not an evolutionary adaptation, then why is it universal?” The final section discusses an area of research area likely to prove crucial to future debates over the evolutionary status of music, namely beat-based rhythmic processing.
在开始之前,有必要讨论两种现象,它们似乎是音乐自然选择的初步证据。第一个涉及由于脑损伤导致存在选择性音乐感知缺陷的个体,换句话说,具有“获得性选择性失乐症”(例如,Peretz,1993;Peretz & Coltheart,2003)。例如,这些人可能无法识别熟悉的旋律或发现新旋律中走调(“酸味”)的音符,尽管他们以前很有音乐性并且在其他方面看起来仍然正常。这些人的存在表明大脑的某些部分已经专门负责音乐。有时人们认为这种“模块化”表明自然选择塑造了大脑的某些部分来执行音乐功能。实际上,成人音乐处理的模块化与音乐能力的选择问题是正交的。这是因为模块可以是开发的产物,而不是反映先天指定的大脑专业化。在这方面的一个有启发性的现象是正字法失读症:由于脑损伤导致阅读印刷字母的特定缺陷。因为书面语言是人类最近在进化方面的发明,我们可以确信特定的大脑区域并没有被自然选择塑造成阅读印刷文字。然而,神经心理学和神经影像学研究表明,左半球的枕颞区有一些区域专门用于识别识字个体的字母(Cohen et al., 2002; cf. Stewart et al., 2003; Hébert &卡迪,2006 年)。在这种情况下,这些区域位于通常参与物体识别的大脑部分。这是开发过程中“渐进式模块化”的有力证明(Karmiloff-Smith,1992 年;Booth 等人,2001 年),并说明了为什么音乐的选择性缺陷在很大程度上与进化论点无关。
Before embarking, it is worth discussing two phenomena that may seem to be prima facie evidence of natural selection for music. The first involves the existence of individuals with selective deficits in music perception due to brain damage, in other words, with “acquired selective amusia” (e.g., Peretz, 1993; Peretz & Coltheart, 2003). For example, such individuals may be unable to recognize familiar melodies or to spot out-of-key (“sour”) notes in novel melodies, despite the fact that they were previously quite musical and still seem normal in other ways. The existence of such individuals shows that parts of the brain have become specialized for music. It is sometimes thought that this “modularity” indicates that natural selection has shaped parts of the brain to carry out musical functions. In fact, the modularity of music processing in adults is orthogonal to the issue of selection for musical abilities. This is because modules can be a product of development, rather than reflecting innately specified brain specialization. An instructive phenomenon in this regard is orthographic alexia: a specific deficit for reading printed letters due to brain damage. Because written language is a recent human invention in evolutionary terms, we can be confident that specific brain areas have not been shaped by natural selection for reading printed script. Nevertheless, both neuropsychological and neuroimaging studies have shown that there are areas in the occipitotemporal region of the left hemisphere that are specialized for recognizing alphabetic letters in literate individuals (Cohen et al., 2002; cf. Stewart et al., 2003; Hébert & Cuddy, 2006). In this case, the areas are in a part of the brain ordinarily involved in object recognition. This is a powerful demonstration of “progressive modularization” during development (Karmiloff-Smith, 1992; Booth et al., 2001) and shows why selective deficits of music are largely irrelevant to evolutionary arguments.
第二种似乎暗示音乐自然选择的现象涉及基因在音乐能力中的作用。据估计,大约 4% 的人口患有音乐性“音聋”(Kalmus & Fry,1980)。这不是由于听力损失或缺乏音乐接触,也不是由于任何社交或情感异常。事实上,音乐性音聋的人在其他方面可能表现得非常正常,除了他们从记事起就经常遇到的明显的音乐困难(Ayotte 等人,2002 年;Foxton 等人,2004 年) ). 例如,当音乐走调(包括他们自己的歌声)时他们听不见,并且除非提供歌词,否则他们无法识别在他们的文化中应该非常熟悉的旋律。1980 年 Kalmus 和弗莱指出,音聋往往在家庭中遗传。最近的一项研究发现,与异卵双胞胎(平均仅共享 50% 的基因)相比,同卵双胞胎(共享 100% 的基因)在音聋测试中彼此更相似。这些发现表明存在一种(或多种)特定基因使人处于患这种疾病的风险中(Drayna 等人,2001 年)。对某些人来说,这可能表明存在“音乐基因”,因此音乐是自然选择的目标。虽然我们将在本章后面深入研究音乐和语言的遗传学,但现在可以说音聋没有提供与音乐能力相关的基因的证据。2个相反,目前的证据表明,音盲个体在音高感知方面存在相当基本的缺陷,这会影响他们的音乐能力,因为音乐对音高感知的要求比任何其他领域都要严格得多(参见第 4 章,第 4.5.2 节) .
A second phenomenon that might seem to suggest natural selection for music concerns the role of genes in musical ability. Musical “tone deafness” is estimated to occur in about 4% of the population (Kalmus & Fry, 1980). It is not due to hearing loss or lack of exposure to music, or to any social or affective abnormality. In fact, musically tone-deaf individuals can appear perfectly normal in every other way, except for a striking difficulty with music that they have often had for as long as they can remember (Ayotte et al., 2002; Foxton et al., 2004). For example, they cannot hear when music is out of tune (including their own singing voice) and cannot recognize what should be very familiar melodies in their culture unless the words are provided. In 1980 Kalmus and Fry noted that tone deafness tends to run in families. A more recent study found that identical twins (who share 100% of their genes) resemble each other more closely on tests of tone deafness than do fraternal twins (who share only 50% of their genes on average). These findings suggest that there is a specific gene (or genes) that puts one at risk for this disorder (Drayna et al., 2001). To some, this might suggest that there are “genes for music,” and that therefore music was a target of natural selection. Although we will delve into the genetics of music and language later in this chapter, for now suffice it to say that tone deafness provides no evidence for genes that are specifically involved in musical abilities.2 On the contrary, current evidence suggests that tone-deaf individuals have rather basic deficits with pitch perception that impact their musical abilities because music places much more stringent demands on pitch perception than does any other domain (cf. Chapter 4, section 4.5.2).
因此,没有简单、一成不变的证据表明音乐一直是自然选择的焦点。选择在人类音乐能力中的作用问题需要仔细研究。这里采取的方法是在这方面比较音乐和语言。
Thus there is no simple, cut-and-dried evidence that music has been the focus of natural selection. The question of selection’s role in human musical abilities requires careful investigation. The approach taken here is to compare music and language in this regard.
人类的身体和大脑是否被语言的自然选择所塑造?在深入研究细节之前,有必要澄清一下“语言自然选择”的含义。显然,人们并不是天生就知道自己的母语:他们在童年时期就学习了母语的语音和结构模式。因此,“语言的自然选择”实际上是指对习得语言能力的选择。从进化的角度来看,我们想知道的是选择是否在塑造语言习得机制方面发挥了直接作用。那些赞成直接选择的人可以被称为“语言适应主义者”。例如,平克和杰肯多夫 (Pinker and Jackendoff, 2005) 认为,语言习得涉及一系列认知和神经专门化,这些专门化是通过进化为这个丰富的交流系统量身定制的。在他们看来,3个
Have human bodies and brains been shaped by natural selection for language? Before delving into details, it is worth clarifying what is meant by “natural selection for language.” Clearly, people are not born knowing their native tongue: They learn the sonic and structural patterns of their native language in childhood. Thus “natural selection for language” really refers to selection for the ability to acquire language. From an evolutionary perspective, what we want to know is whether selection has played a direct role in shaping the mechanisms of language acquisition. Those who favor a direct role for selection can be termed “language adaptationists.” For example, Pinker and Jackendoff (2005) argue that language acquisition involves a set of cognitive and neural specializations tailored by evolution to this rich communicative system. In their view, the selective pressure behind language evolution was the adaptive value of expressing and understanding complex propositions.3
另一种观点认为,没有直接选择语言。相反,选择起到了创造某些独特的社会认知学习的作用人类的能力,例如“共享意向性”,即实现共同目标和共同关注的能力和动力。根据 Tomasello 等人的说法。(2005),这将个体认知转化为文化认知,为模仿和指导学习奠定了基础,并允许发明基于符号的合作交流系统。按照这种观点,人类利用他们特殊的社会认知能力来构建语。Tomasello 和其他“语言建构主义者”提供了与语言适应主义者截然不同的语言进化图景,并对适应主义者的立场提出了一些有说服力的批评(参见 Tomasello,1995;Elman,1999)。建构主义观点的一个有趣的推论是我们的大脑与我们最近的前语言祖先的大脑在生物学上没有区别。这是因为语言被视为一种社会建构。一个有用的类比是下棋。下棋是一种复杂的认知能力,是我们物种独有的,但肯定不是自然选择的目标。尽管国际象棋中使用的一些认知能力可能是选择的直接目标,但这些能力并不是由选择塑造的国际象棋也不是特定于它的(参见 Sober,1984)。这里的关键概念点是,自然选择在塑造我们的国际象棋能力方面只起到了间接的作用。
An alternative view proposes that there has been no direct selection for language. Instead, selection acted to create certain unique social-cognitive learning abilities in humans, such as “shared intentionality,” the capacity and drive for shared goals and joint attention. According to Tomasello et al. (2005), this transforms individual cognition into cultural cognition, lays the foundation for imitation and instructed learning, and permits the invention of a cooperative communication system based on symbols. In this view, humans used their special social-cognitive abilities to construct language. Tomasello and other “language constructivists” offer a very different picture of language evolution than language adaptationists, and have presented some cogent criticisms of the adaptationist position (cf. Tomasello, 1995; Elman, 1999). One interesting corollary of the constructivist viewpoint is the notion that our brains are not biologically different from those of our most recent prelinguistic ancestors. This is because language is a seen as a social construct. A useful analogy is to chess playing. Chess playing is a complex cognitive ability that is unique to our species, but which has certainly not been the target of natural selection. Although some of the cognitive abilities used in chess may have been direct targets of selection, these abilities were not shaped by selection for chess nor are they specific to it (cf. Sober, 1984). The key conceptual point here is that natural selection has played only an indirect role in shaping our chess-playing abilities.
因此,语言适应论者和语言建构论者之间的争论可以被看作是相信自然选择在塑造人类语言能力方面的直接作用和间接作用的学者之间的争论。也有一些良好的立场并不整齐地属于这两个类别:例如,Deacon(1997)认为语言和人脑通过协同进化过程彼此相互选择。对于那些对当前关于语言进化的思考范围(范围广泛且发展迅速)感兴趣的人,可以参考几本有用的书籍和评论文章(例如,Christiansen & Kirby,2003a,2003b)。主要观点是,关于人类身体和大脑在多大程度上被语言选择直接塑造的问题存在着激烈的争论。
Thus the debate between language adaptationists and language constructivists can thus be seen as a debate between scholars who believe in a direct versus an indirect role for natural selection in shaping human linguistic abilities. There are also well-articulated positions that do not fall neatly into either category: For example, Deacon (1997) has argued that language and the human brain have acted as selective forces on each other via a process of coevolution. For those interested in the range of current thinking on language evolution (which is broad and growing rapidly), there are several useful books and review articles to consult (e.g., Christiansen & Kirby, 2003a, 2003b). The main point is that there is a lively debate over the extent to which human bodies and brains have been directly shaped by selection for language. What follows is a list of 10 lines of evidence that I find the most compelling in favor of a direct role for natural selection in the evolution of language.
语言发展最引人注目的方面之一是牙牙学语:大约 7 个月大时,婴儿开始以重复的顺序发出无意义的音节,例如 /ba/ 和 /da/ (Locke, 1993)。牙牙学语可能有助于婴儿了解口头运动和听觉结果之间的关系,换句话说,可以调整他们将用于获得其物种交流系统的感知运动技能。牙牙学语是自然选择对语言习得产生影响的证据,因为牙牙学语的出现是自发的,而不仅仅是对成人语言的模仿。这方面的一个关键证据是,聋哑婴儿即使没有经验也会发出声音与他人的讲话 (Oller & Eilers, 1988)。因此,牙牙学语的开始似乎反映了声音学习神经机制的成熟。4这种观点得到鸣禽实验研究的进一步支持,鸣禽和人类一样,学会产生复杂的声音信号进行交流。幼鸟会经历一个咿呀学语阶段(称为“子鸣”),在此期间它们会产生成鸟鸣叫元素的不成熟版本(Doupe & Kuhl,1999)。此外,即使出生时耳聋,他们也会喋喋不休(Marler,1999)。因此,牙牙学语似乎是进化中启动发声学习过程的方式。
One of the most striking aspects of language development is babbling: Around the age of 7 months, babies begin to produce nonsense syllables such as /ba/ and /da/ in repetitive sequences (Locke, 1993). Babbling likely helps babies learn the relationship between oral movements and auditory outcomes, in other words, to tune the perceptual-motor skills they will use in acquiring their species’ communication system. Babbling is evidence that selection has acted on language acquisition because its emergence is spontaneous, and not simply an imitation of adult speech. A key piece of evidence in this regard is that deaf babies produce vocal babbles even though they have no experience with the speech of others (Oller & Eilers, 1988). Thus the onset of babbling appears to reflect the maturation of neural mechanisms for vocal learning.4 This view is further supported by experimental studies of songbirds, who, like humans, learn to produce complex acoustic signals for communication. Young birds go through a babbling stage (called “subsong”) during which they produce immature versions of the elements of adults’ song (Doupe & Kuhl, 1999). Furthermore, they babble even if deafened at birth (Marler, 1999). Thus babbling appears to be evolution’s way of kick-starting the vocal learning process.
有趣的是,就人类而言,牙牙学语的机制并不局限于言语。接触到手语的聋哑婴儿会用他们的手喋喋不休,以重复序列产生非指称符号(例如,手语等同于“bababa”;Petitto & Marentette,1991;Petitto 等,2001,2004)。Emmorey (2002) 提出,牙牙学语因此代表了一种机制的成熟,该机制在运动输出和感官输入之间进行映射,并引导人类发现语言的语音结构,无论是口语还是手语。
Interestingly, in the human case, the mechanisms underlying babbling are not bound to speech. Deaf babies exposed to sign language will babble with their hands, producing nonreferential signs in reduplicated sequences (e.g., the sign language equivalent of “bababa”; Petitto & Marentette, 1991; Petitto et al., 2001, 2004). Emmorey (2002) has suggested that babbling thus represents the maturation of a mechanism that maps between motor output and sensory input, and guides humans toward the discovery of the phonological structure of language, whether spoken or signed.
与其他灵长类动物相比,人类的声道不同寻常:由于在大约 3 个月到 3 岁之间逐渐发生解剖学下降,喉头位于喉咙的下方(Fitch,2000 年)。5个人类喉部的低位置具有生物学成本。在其他灵长类动物中,喉部与鼻道相连,可以同时进行吞咽和呼吸。相比之下,人类每次摄入食物都有窒息的风险。Lieberman 及其同事 (1969; cf. Lieberman, 1984) 最先提出人体解剖学的这一特殊方面与语言的进化有关。Lieberman 认为,降低的喉部增加了语音的范围和可辨别性,因为它为舌头提供了在声道中垂直和水平移动的空间,从而允许更独特的共振峰模式调色板(共振峰在第 2 章中讨论)). 这是一个丰富的假设,引发了大量的研究和争论(Ohala,1984 年;Fitch 2000 年;参见 Ménard 等人,2004 年)。直到今天,它仍然是一个可行的想法和要点自然选择在塑造人体语言和语言方面的直接作用。6个
Compared to other primates, humans have an unusual vocal tract: The larynx sits much lower in the throat due to a gradual anatomical descent between about 3 months and 3 years of age (Fitch, 2000).5 The low position of the larynx in humans has a biological cost. In other primates, the larynx connects with the nasal passages, allowing simultaneous swallowing and breathing. In contrast, humans risk choking each time they ingest food. Lieberman and colleagues (1969; cf. Lieberman, 1984) were the first to suggest that this special aspect of human anatomy was related to the evolution of language. Lieberman argued that a lowered larynx increases the range and discriminability of speech sounds because it gives the tongue room to move both vertically and horizontally in the vocal tract, allowing for a more distinctive palette of formant patterns (formants are discussed in Chapter 2). This has been a fertile hypothesis, leading to a good deal of research and debate (Ohala, 1984; Fitch 2000; cf. Ménard et al., 2004). As of today, it is still a viable idea and points to a direct role for natural selection in shaping the human body for speech and thus for language.6
发声学习是指根据听觉经验和感官反馈学习产生发声信号。这种能力对我们来说似乎很平常,因为每个孩子都将其作为学习说话的一部分来展示。然而,从进化的角度来看,发声学习是一种罕见的特征,只出现在少数动物群体中(包括鸣禽、鹦鹉和鲸类动物;参见 Fitch,2006 年;Merker,2005 年)。值得注意的是,人类在表现出复杂的发声学习方面在灵长类动物中是独一无二的(Egnor & Hauser,2004)。即使是受过语言训练的猿类也不善于模仿口语,而是依赖视觉符号进行交流。7人类声音学习的神经基础尚不清楚,但人类与其他灵长类动物之间的这种差异几乎可以肯定是自然选择塑造的一组特征(包括牙牙学语和改良的声道)的一部分,有利于儿童获得复杂的声音声通信系统。
Vocal learning refers to learning to produce vocal signals based on auditory experience and sensory feedback. This ability seems commonplace to us because every child exhibits it as part of learning to speak. An evolutionary perspective, however, reveals that vocal learning is an uncommon trait, having arisen in only a few groups of animals (including songbirds, parrots, and cetaceans; cf. Fitch, 2006; Merker, 2005). Notably, humans are unique among primates in exhibiting complex vocal learning (Egnor & Hauser, 2004). Even language-trained apes are poor at imitating spoken words, and rely instead on visual symbols to communicate.7 The neural substrates of vocal learning in humans are not well understood, but this difference between humans and other primates is almost certainly part of an ensemble of characteristics (including babbling and a modified vocal tract) shaped by natural selection to favor children’s acquisition of a complex acoustic communication system.
人类语言的音素和音节在声学上是复杂的实体,婴儿来到这个世界似乎已经准备好了解它们。到 6 个月大时,他们开始表现出学习他们语言的特定元音的证据(Kuhl 等人,1992 年)。此后不久,他们对某些在他们的语言中不存在的语音对比失去敏感性,并对母语中其他困难的语音对比变得敏感(Werker & Tees, 1984; 1999; Polka et al., 2001)。也就是说,婴儿来到这个世界时,睁着适合任何人类语言声音的耳朵,但很快就开始以有利于他们母语的方式“用口音听”。婴儿还表现出令人印象深刻的能力,可以识别不同说话人的语音(例如特定元音或音节)的等价性,
The phonemes and syllables of human languages are acoustically complex entities, and infants seem to come into the world prepared to learn about them. By 6 months of age, they start to show evidence of learning the particular vowel sounds of their language (Kuhl et al., 1992). Soon thereafter, they lose sensitivity to certain phonetic contrasts that do not occur in their language, and gain sensitivity for other, difficult phonetic contrasts in their native tongue (Werker & Tees, 1984; 1999; Polka et al., 2001). That is, infants come into the world with open ears suited to the sounds of any human language, but soon begin to “hear with an accent” in ways that favor their native tongue. Infants also show an impressive ability to recognize the equivalence of a speech sound (such as a particular vowel or a syllable) across differences in speaker, gender, and speech rate, a task that has proven difficult for even very powerful computers (e.g., Kuhl, 1979, 2004).
与这些感知技能相辅相成的是声音制作技能。说话涉及调节气流,同时使用多个发音器产生快速且时间重叠的手势(Stevens,1998)。孩子们在 3 或 4 岁时基本上掌握了这些复杂的运动技能,他们的讲话不仅流利且易于理解,而且还具有他们母语的许多微妙之处(例如,依赖于上下文的音位修饰,这有助于母语口音). 这些能力与 3 至 4 岁儿童的其他运动技能形成鲜明对比,例如他们准确投掷和接住物体的能力。
Complementing these perceptual skills are skills at sound production. Speaking involves regulating the airstream while producing rapid and temporally overlapping gestures with multiple articulators (Stevens, 1998). Children largely master these complex motor skills by the age of 3 or 4, producing speech that is not only fluent and highly intelligible but that has many of the subtleties of their native language (e.g., context-dependent phonological modifications, which contribute to native accent). These abilities stand in sharp contrast to other motor skills of 3- to 4-year-olds, such as their ability to throw and catch objects with accuracy.
当然,语言的感知和产生在学习中密切相关。表明选择在塑造学习机制中的作用的一个事实是声音发展对这一领域的感知体验非常敏感。例如,即使是在 6 岁之前偶然听到一种语言,也会影响一个人成年后使用该语言产生的音素。奥等人。(2002 年;另见 Knightly 等人,2003 年)发现,在 6 岁之前听过几年非正式西班牙语的美国大学生以相同的方式产生词首塞音,例如 /p/ 和 /k/当被要求说出西班牙语句子时,对母语人士。8个相比之下,没有偶然接触童年的对照组以更具有英语母语者特征的方式发出这些声音。有趣的是,这两个小组在涉及对西班牙语进行语法判断的任务上表现没有差异,这表明偶然接触小组并不仅仅在任何明确的意义上更好地了解西班牙语。因此,语音学习系统似乎专门准备将感知经验用于塑造生产能力(参见 Oh 等人,2003 年基于韩语的这项工作的复制和扩展)。
Of course, perception and production of speech are intimately linked in learning. One fact that suggests a role for selection in shaping learning mechanisms is the exquisite sensitivity of vocal development to perceptual experience in this domain. For example, even casually overhearing a language before the age of 6 can have an influence on one’s production of phonemes from that language as an adult. Au et al. (2002; see also Knightly et al., 2003) found that American college students who had overheard informal Spanish for a few years before the age of 6 produced word-initial stop consonants such as /p/ and /k/ in a manner identical to native speakers when asked to produce Spanish sentences.8 In contrast, a control group without casual childhood exposure produced these sounds in a manner more characteristic of native English speakers. Interestingly, the two groups performed no differently on tasks that involved making grammatical judgments about Spanish, showing that the casual-exposure group did not simply know Spanish better in any explicit sense. Thus the speech-learning system seems specially prepared to put perceptual experience to work in shaping production abilities (cf. Oh et al., 2003 for a replication and extension of this work based on Korean).
婴儿言语感知领域是一个充满活力的研究领域,理论不断发展,争论不断(例如,Kuhl,2004 年;Werker 和 Curtin,2005 年),但人们对以下事实达成共识:人类是语言声音结构的早熟学习者,在很小的时候就获得了复杂的感知和生产技能。这表明自然选择塑造了语音习得的机制。
The field of infant speech perception is a dynamic research area with evolving theories and ongoing debates (e.g., Kuhl, 2004; Werker & Curtin, 2005), but there is consensus on the following fact: Humans are precocious learners of the sound structure of language, achieving complex perceptual and production skills at a very young age. This suggests that natural selection has shaped the mechanisms of speech acquisition.
关键期或敏感期是发育过程对环境输入特别敏感的时间窗口。这段时间的投入(或缺乏投入)会对特定领域的成人能力的最终水平产生深远影响。鸣禽的声音发展是生物学中一个经过充分研究的案例。如果学会歌唱的鸟类(例如沼泽麻雀幼崽)在特定年龄之前不允许听到成年鸟的歌声,那么它们将永远无法完全熟练地掌握其物种的歌声,即使在以后的生活中不受限制地接触它们也是如此(Marler , 1999). 关键时期背后的机制仍然存在争议。一种观点认为特定大脑区域的神经增殖率(例如,通过突触密度测量)受生物控制,随着时间的推移由于内部因素逐渐减少(Huttenlocher,2002)。关键时期暗示了自然选择形成的机制,有利于早期获得具有重要行为学意义的能力。
A critical period or sensitive period is a time window when developmental processes are especially sensitive to environmental input. Input (or lack of it) during this time can have a profound effect on the ultimate level of adult ability in specific areas. Vocal development in songbirds is a well-studied case in biology. If birds who learn their song (e.g., baby swamp sparrows) are not permitted to hear the song of an adult before a certain age, they will never acquire full proficiency in their species’ song, even if given unlimited exposure later in life (Marler, 1999). The mechanisms behind critical periods are still a matter of debate. One idea is that the rate of neural proliferation (e.g., as measured by synaptic density) in specific brain areas is under biological control, with gradual reduction over time due to internal factors (Huttenlocher, 2002). Critical periods are suggestive of mechanisms shaped by natural selection to favor early acquisition of ethologically important abilities.
在一本有影响力的书中,Lennenberg (1967) 提出人类有一个语言习得的关键时期,该时期在青春期结束。当然(值得庆幸的是)人们无法通过对鸣禽进行的那种剥夺实验来检验这一想法。有一些语言剥夺儿童的案例(例如,Curtiss,1977),但社会创伤的混杂影响使这些案例难以解释(Rymer,1993)。目前,语言关键期的最好证据来自对第二语言学习的研究和对手语的研究。在前一类中,Johnson 和 Newport (1989) 研究了移居美国的中国和韩国移民,检查他们的英语熟练程度与他们进入该国的年龄的关系(接触英语的总年数与组)。他们发现接触年龄对语法能力有很大影响。有人可能会争辩说,这不是由于任何生物学机制,而仅仅是因为老年人的母语发展得更好,这对学习新语言的干扰更大。对手语习得的研究减轻了这种担忧。Mayberry 及其同事(Mayberry 和 Eichen,1991 年;Mayberry 和 Lock,2003 年)表明,当没有其他语言的个体手语输入延迟时,会对成人语法技能和获得第二语言的能力产生重大影响在以后的生活中。但仅仅是因为老年人的母语发展得更好,这对学习新语言的干扰更大。对手语习得的研究减轻了这种担忧。Mayberry 及其同事(Mayberry 和 Eichen,1991 年;Mayberry 和 Lock,2003 年)表明,当没有其他语言的个体手语输入延迟时,会对成人语法技能和获得第二语言的能力产生重大影响在以后的生活中。但仅仅是因为老年人的母语发展得更好,这对学习新语言的干扰更大。对手语习得的研究减轻了这种担忧。Mayberry 及其同事(Mayberry 和 Eichen,1991 年;Mayberry 和 Lock,2003 年)表明,当没有其他语言的个体手语输入延迟时,会对成人语法技能和获得第二语言的能力产生重大影响在以后的生活中。9
In an influential book, Lennenberg (1967) proposed that humans have a critical period for language acquisition that ends at puberty. Of course (and thankfully), one cannot test this idea with the kind of deprivation experiments done with songbirds. There are a few cases of language-deprived children (e.g., Curtiss, 1977), but the confounding effects of social trauma make these cases hard to interpret (Rymer, 1993). At the present time, the best evidence for a critical period for language comes from studies of second-language learning and from research on sign language. In the former category, Johnson and Newport (1989) studied Chinese and Korean immigrants to the United States, examining their proficiency in English as a function of the age at which they entered the country (overall number of years of exposure to English was matched across groups). They found strong effects of age of exposure on grammatical abilities. One might argue that this is due not to any biological mechanism, but simply to the fact that older individuals have a better developed native language, which interfered more with the acquisition of a new language. This concern is mitigated by studies of sign language acquisition. Mayberry and colleagues (Mayberry and Eichen, 1991, Mayberry and Lock, 2003) have shown that when sign language input is delayed in individuals with no other language, there is a significant impact on adult grammatical skills and on the ability to acquire a second language later in life.9
尽管毫无疑问语音是人类语言的生物学给定渠道,但一个值得注意的事实是,个人可以通过使用手语在没有声音的情况下进行语言交流。重要的是要注意真正的手语,例如美国手语 (ASL) 和英国手语语言 (BSL) 不仅仅是翻译成手势的口头语言(例如,手指拼写的英语)。相反,它们是截然不同的人类语言,其结构模式可能与周围的口语完全不同(Klima & Bellugi,1979;Emmorey,2002)。因此,例如,尽管美式英语和英式英语可以相互理解,但 ASL 和 BSL 却不是。
Although there is little doubt that speech is the biologically given channel for human language, it is a remarkable fact that individuals can communicate linguistically without sound, by using sign language. It is important to note that true sign languages such as American Sign Language (ASL) and British Sign Language (BSL) are not merely spoken language translated into gesture (e.g., finger-spelled English). Instead, they are distinct human languages with structural patterns that can be quite different from the surrounding spoken language (Klima & Bellugi, 1979; Emmorey, 2002). Thus for example, although American and British English are mutually intelligible, ASL and BSL are not.
认知研究表明,手语和口语具有人类语言的基本组成部分:音韵学、形态学、句法和语义,并且从小接触手语的聋儿以与健听儿童相当的方式获得语言的这些方面(Emmorey , 2002). 此外,对手语失语症和现代神经影像学技术的研究表明,尽管手语具有视觉空间模式,但它依赖于许多与口语相同的左半球大脑区域(概述参见 Emmorey 2002,第 9 章)。我将在第 7.2.8 节中详细介绍手语以下。现在,只要说语言可以“跳跃模式”这一事实就足以证明人类对语言的驱动力,表明自然选择在塑造语言习得过程中的作用。
Cognitive research has revealed that sign and spoken languages share the basic ingredients of human language: phonology, morphology, syntax and semantics, and that deaf children exposed to sign from an early age acquire these aspects of language in ways quite parallel to hearing children (Emmorey, 2002). Furthermore, studies of sign language aphasia and modern neuroimaging techniques have revealed that despite its visuospatial modality, sign language relies on many of the same left hemisphere brain areas as spoken language (see Emmorey 2002, Ch. 9 for an overview). I shall have more to say about sign language in section 7.2.8 below. For now, suffice it so say that the fact that language can “jump modalities” is a powerful testament to the human drive for language, suggesting a role for natural selection in shaping the language acquisition process.
所有接触语言的正常婴儿都会迅速发展出一套复杂的语言产生和感知技能,3 至 4 岁儿童的语言能力证明了这一点。关于这个过程的一个有趣问题涉及婴幼儿接受的语言输入的质量和数量的可变性。van de Weijer (1998) 记录了一个荷兰婴儿在 6 到 9 个月大时清醒时听到的大部分内容,这是为数不多的长期测量婴儿语言输入的已发表研究之一。在分析该数据的一个子集(18 天)时,他发现婴儿每天听到大约 2.5 小时的讲话,其中约 25% 是针对婴儿的(即平均每天约 37 分钟)。这似乎小得惊人,而且无疑比一些婴儿接受的要少(即,那些独自在家和健谈的照顾者在一起的人)。然而,它可能远远超过婴儿在某些文化中接受的数量,据称成年人几乎不与语言前婴儿交谈(例如,巴布亚新几内亚的 Kaluli;参见 Schieffelin,1985)。不幸的是,我们缺乏关于随后发展出正常语言的婴儿的语言输入范围的数据。然而,我怀疑这个范围相当大(即使在西方文化中)。如果这被证明是正确的,那么语言习得的稳健性表明,自然选择塑造了学习语言的强烈倾向和快速学习语言的机制,即使输入最少也是如此。巴布亚新几内亚的卡卢利人;参见 Schieffelin,1985)。不幸的是,我们缺乏关于随后发展出正常语言的婴儿的语言输入范围的数据。然而,我怀疑这个范围相当大(即使在西方文化中)。如果这被证明是正确的,那么语言习得的稳健性表明自然选择塑造了学习语言的强烈倾向和快速学习语言的机制,即使输入最少也是如此。巴布亚新几内亚的卡卢利人;参见 Schieffelin,1985)。不幸的是,我们缺乏关于随后发展出正常语言的婴儿的语言输入范围的数据。然而,我怀疑这个范围相当大(即使在西方文化中)。如果这被证明是正确的,那么语言习得的稳健性表明自然选择塑造了学习语言的强烈倾向和快速学习语言的机制,即使输入最少也是如此。
All normal infants exposed to language rapidly develop a complex set of linguistic production and perception skills, as evidenced by the linguistic abilities of 3- to 4-year-olds. An interesting question about this process concerns the variability in the quality and quantity of linguistic input received by infants and young children. In one of the few published studies that measured language input to the infant over an extended period, van de Weijer (1998) recorded most of what a single Dutch infant heard during its waking hours between 6 and 9 months of age. In analyzing a subset of this data (18 days), he found that the infant heard about 2.5 hours of speech per day, of which about 25% was directed at the infant (i.e., about 37 minutes per day, on average). This seems surprisingly small, and is no doubt less than some infants receive (i.e., those alone at home with a talkative caregiver). Yet it may be far more than babies receive in some cultures, in which it is claimed that adults barely talk to prelingual infants (e.g., the Kaluli of Papua New Guinea; see Schieffelin, 1985). Unfortunately, we lack data on the range of linguistic input to infants who subsequently develop normal language. I suspect, however, that this range is quite large (even within Western culture). If this proves to be true, then the robustness of language acquisition suggests that natural selection has shaped both a strong predisposition to learn language and the mechanisms to learn it quickly, even from minimal input.
在提出这一点时,我不想暗示经验的可变性与语言发展无关。相反,有很好的证据表明语言输入的丰富性和复杂性之间存在关系以及儿童自己的词汇习得、复杂句法的使用等(Huttenlocher 等人,1991 年;Hart 和 Risley,1995 年)。然而,关键是这些关系建立在一组似乎以非常稳健的方式发展的基本能力之上。
In making this point, I do not wish to suggest that variability in experience does not matter to language development. On the contrary, there is good evidence for a relationship between the richness and complexity of linguistic input and a child’s own vocabulary acquisition, use of complex syntax, and so forth (Huttenlocher et al., 1991; Hart & Risley, 1995). However, the key point is that these relations build on a basic set of abilities that appear to develop with remarkable robustness.
在过去的 25 年里,语言学家获得了前所未有的机会来记录由尼加拉瓜的一个聋人社区创造的一种新语言的诞生。这个社区的历史很有趣。在 1970 年代之前,失聪的尼加拉瓜人大多彼此孤立,并使用仅显示基本语言的手势与另一个人进行交流(“home sign”系统;Goldin-Meadow,1982) ;科波拉,2002 年)。然而,1977年,马那瓜成立了一所聋儿特殊学校,学生人数迅速增长。尽管老师们专注于读唇语和说西班牙语(收效甚微),但聋哑儿童使用手势相互交流,并开始了语言发展的过程,在这个过程中,每个新的儿童群体都从前一个群体中学习。对这个社区的研究产生了一个了不起的发现:随着语言的发展,它变得越来越语法化,并且这种变化正在被驱动由孩子们。也就是说,第二批人不只是复制了第一批人的语言;他们在学习时改变了它。例如,他们系统化了空间位置的使用来表示语法关系,从而能够表达单词之间的长距离依赖关系 (Senghas & Coppola, 2001)。他们还表现出强烈偏好将整体手势分解为离散的组件并按顺序排列这些组件(Senghas 等人,2004 年)。这些发现补充了早期关于克里奥尔语口语结构出现的研究(Bickerton,1984 年;Holm,2000 年),但为语言学习的心理倾向提供了更有力的案例,因为尼加拉瓜儿童无法接触到会说完整语言的成年人发达的语言。也就是说,与大多数文化系统不同,成人使用的结构比儿童使用的结构更有条理和系统,尼加拉瓜手语说明了儿童正在创造更系统的模式的案例。这暗示了由自然选择塑造的学习倾向(参见 Sandler 等人,2005)。
Over the past 25 years, linguists have had an unprecedented opportunity to document the birth of a new language, created by a deaf community in Nicaragua. The history of this community is interesting. Before the 1970s, deaf Nicaraguans were mostly isolated from each other, and communicated with their hearing families using gestures that showed only the rudiments of language and that varied widely from one person to the next (“home sign” systems; Goldin-Meadow, 1982; Coppola, 2002). However, in 1977, a special school for deaf children was founded in Managua, and its student body grew rapidly. Although the teachers focused on lip reading and speaking Spanish (with little success), the deaf children used gestures to communicate among themselves, and began a process of language development in which each new cohort of children learns from the previous cohort. Research on this community has produced a remarkable finding: As the language develops, it is becoming increasingly grammaticalized, and this change is being driven by the children. That is, the second cohort did not just reproduce the language of the first cohort; they changed it as they learned it. For example, they systematized the use of spatial locations to signal grammatical relationships, enabling the expression of long-distance dependencies between words (Senghas & Coppola, 2001). They also showed a strong preference for breaking holistic gestures into discrete components and arranging these in sequences (Senghas et al., 2004). These findings complement earlier work on the emergence of structure in spoken Creole languages (Bickerton, 1984; Holm, 2000), but provide an even stronger case for mental predispositions for language learning, because the Nicaraguan children had no access to adults who spoke a fully developed language. That is, unlike most cultural systems, in which the structures used by adults are more organized and systematic than those used by children, Nicaraguan Sign Language illustrates a case in which children are creating the more systematic patterns. This is suggestive of learning predispositions that have been shaped by natural selection (cf. Sandler et al., 2005).
近年来,人们发现了一种对言语和语言有强烈影响的人类单基因突变。当该基因的一个拷贝(称为 FOXP2)受损时,个体会出现一系列问题言语和语言,包括口头运动缺陷(延伸到复杂的非言语手势)、音素处理困难以及语法和词汇判断问题(概述参见 Marcus & Fisher,2003 年)。这些人也有非语言缺陷(Alcock et al., 2000a, b),但重要的是,他们不仅仅患有整体智力缺陷:语言能力比非语言能力受损更严重,一些人在正常智商下具有非语言能力范围。
Recent years have seen the discovery of a single-gene mutation in humans that has a strong influence on speech and language. When one copy of this gene (known as FOXP2) is damaged, individuals show a range of problems with speech and language, including deficits in oral movements (which extend to complex nonspeech gestures), difficulty in manipulating phonemes, and problems with grammar and lexical judgments (see Marcus & Fisher, 2003 for an overview). These individuals also have nonlinguistic deficits (Alcock et al., 2000a, b), but importantly, they do not simply suffer from an overall intellectual deficit: Verbal abilities are more impaired than nonverbal abilities, with some individuals having nonverbal abilities in the normal IQ range.
FOXP2 并非人类独有。它存在于许多其他物种中,包括黑猩猩、鸟类和鳄鱼。然而,人类 FOXP2 的确切 DNA 序列与其他物种不同,并且在智人中几乎没有变化。对这种变异性的定量分析表明,该基因已成为人类进化过程中的选择目标,并在过去 20 万年内以目前的形式固定下来(即变得普遍)(Enard 等人,2002 年)。
FOXP2 is not unique to humans. It occurs in many other species, including chimpanzees, birds, and crocodiles. However, the exact DNA sequence of human FOXP2 differs from other species, and shows almost no variation within Homo sapiens. Quantitative analysis of this variability suggests that this gene has been a target of selection in human evolution, and was fixed (i.e., became universal) in its current form within the past 200,000 years (Enard et al., 2002).
FOXP2 将在第 7.3.4 节中进一步讨论,我们将在其中深入研究有关该基因及其在大脑和语言中的作用的更多机制细节。目前,相关的一点是人类 FOXP2 的 DNA 序列中极低的可变性表明该语言相关基因已成为自然选择的目标。
FOXP2 is discussed further in section 7.3.4, where we will delve into more mechanistic details about this gene and its role in brain and language. For the moment, the relevant point is that the extremely low variability in the DNA sequence of human FOXP2 suggests that this language-relevant gene has been a target of natural selection.
如果一个性状是由自然选择直接塑造的,这必然意味着该性状在生存和繁殖成功方面具有进化优势。因此,如果一个物种的现有成员未能发展出该特征,则该个体可能会承担重大的生物成本。例如,一只不能回声定位的小棕蝠(Myotis lucifugus)的存活率和繁殖率可能比能回声定位的小蝙蝠低得多。对于蝙蝠,可以做一个实验来验证这个想法(例如,从出生时就堵住蝙蝠的耳朵)。对于人类,谢天谢地我们无法进行语言剥夺研究,但似乎很可能没有语言能力的人在过去或现在的任何人类社会中在生存和繁殖方面都处于严重劣势。
If a trait has been directly shaped by natural selection, this by necessity means the trait conferred an evolutionary advantage in terms of survival and reproductive success. Thus if an existing member of a species fails to develop the trait, that individual is likely to bear a significant biological cost. For example, a little brown bat (Myotis lucifugus) that cannot echolocate is likely to have much lower survival and reproduction rate than one that can. With bats, one can do an experiment to test this idea (e.g., by plugging the bat’s ears from birth). With humans, we thankfully cannot do language-deprivation studies, but it seems highly likely that a human being without language abilities would be at a severe disadvantage in terms of survival and reproduction in any human society, past or present.
上面的简短回顾绝不是穷尽了支持自然选择在语言进化中的作用的证据。举几个例子就足以说明这一点。首先,Deacon (1997) 认为,与其他灵长类动物相比,我们的额叶皮层的相对大小异常大,这反映了语言背后特定认知操作的选择压力。其次,MacLarnon 和 Hewitt (1999) 报告说,参与为呼吸控制提供神经元的脊髓区域扩大了在Homo ergaster(直立人)和Homo sapiens相对于其他灵长类动物和更早的原始人,并认为这可能反映了对语言的适应。第三,Conway 和 Christiansen (2001) 回顾的研究表明,人类在学习等级序列的能力方面优于其他灵长类动物,有人可能会争辩说,这反映了对语言所采用的认知技能的选择。第四,人们经常声称语言语法的普遍性是语言先天知识的证据。我回避了这些和其他论点,因为它们会引起大量争论。例如,最近对灵长类动物额叶皮层的比较分析表明,人类额叶皮层的扩展并没有超出人们对我们体型的灵长类动物的预测(Semendeferi 等人,2002 年)。关于增加呼吸控制,Fitch (2000) 指出,这可能是由于语言以外的原因而演变的(例如,长时间跑步,参见 Bramble & Lieberman,2004)。语法普遍性是一个有趣争议的焦点,一些研究人员认为,这些并非内置于我们的大脑中,而是多种力量汇聚的结果,包括人类对顺序学习的限制和控制复杂通信系统的符号学限制(例如, Christiansen & Kirby 2003b 及其中的参考文献;Deacon, 2003)。最后,人类在学习层次序列方面表现出色的证据可能无法反映对语言的特定选择(例如,原则上这可能通过对音乐的选择而产生!)。语法普遍性是一个有趣争议的焦点,一些研究人员认为,这些并非内置于我们的大脑中,而是多种力量汇聚的结果,包括人类对顺序学习的限制和控制复杂通信系统的符号学限制(例如, Christiansen & Kirby 2003b 及其中的参考文献;Deacon, 2003)。最后,人类在学习层次序列方面表现出色的证据可能无法反映对语言的特定选择(例如,原则上这可能通过对音乐的选择而产生!)。语法普遍性是一个有趣争议的焦点,一些研究人员认为,这些并非内置于我们的大脑中,而是多种力量汇聚的结果,包括人类对顺序学习的限制和控制复杂通信系统的符号学限制(例如, Christiansen & Kirby 2003b 及其中的参考文献;Deacon, 2003)。最后,人类在学习层次序列方面表现出色的证据可能无法反映对语言的特定选择(例如,原则上这可能通过对音乐的选择而产生!)。包括人类对顺序学习的限制和控制复杂通信系统的符号学限制(例如,Christiansen & Kirby 2003b 和其中的参考资料;Deacon,2003 年)。最后,人类在学习等级序列方面表现出色的证据可能无法反映对语言的特定选择(例如,原则上这可能通过对音乐的选择而产生!)。包括人类对顺序学习的限制和控制复杂通信系统的符号学限制(例如,Christiansen & Kirby 2003b 和其中的参考资料;Deacon,2003 年)。最后,人类在学习等级序列方面表现出色的证据可能无法反映对语言的特定选择(例如,原则上这可能通过对音乐的选择而产生!)。
The brief review above by no means exhausts the evidence that one could proffer in favor of a role for natural selection in language evolution. A few examples will suffice to illustrate this point. First, Deacon (1997) argues that the relative size of our frontal cortex is unusually large compared to other primates, and that this reflects selective pressure for specific cognitive operations underlying language. Second, MacLarnon and Hewitt (1999) report that regions of the spinal cord involved in supplying neurons for breath control are enlarged in Homo ergaster (erectus) and Homo sapiens relative to other primates and earlier hominids, and argue that this may reflect an adaptation for speech. Third, Conway and Christiansen (2001) review studies showing that humans outperform other primates in their ability to learn hierarchical sequences, and one could argue that this reflects selection for cognitive skills that are employed by language. Fourth, it is often claimed that universals of linguistic grammar are evidence for an innate knowledge of language. I have avoided these and other arguments because they are subject to a great deal of debate. For example, recent comparative analyses of primate frontal cortex suggest that human frontal cortex has not expanded much beyond what one would predict for a primate of our size (Semendeferi et al., 2002). With regard to increased breath control, Fitch (2000) has pointed out that this may have evolved for reasons other than language (e.g., prolonged running, cf. Bramble & Lieberman, 2004). Grammatical universals are the focus of an interesting controversy, with some researchers arguing that these are not built into our brains but result from the convergence of a number of forces, including human limits on sequential learning and the semiotic constraints that govern complex communication systems (e.g., Christiansen & Kirby 2003b and references therein; Deacon, 2003). Finally, evidence that humans are superior at learning hierarchical sequences may not reflect specific selection for language (for example, in principle this could arise via selection for music!).
然而,突出的一点是,即使人们避开这些有争议的证据,仍然有坚实的基础拒绝语言进化的原假设。也就是说,第7.2.1 节至7.2.10节中审查的证据的综合权重强烈支持这样一种观点,即人体和大脑是由语言的自然选择塑造的。
The salient point, however, is that even if one eschews these contentious lines of evidence, there is still a solid basis for rejecting the null hypothesis for language evolution. That is, the combined weight of the evidence reviewed in sections 7.2.1 to 7.2.10 strongly favors the idea that human bodies and brains have been shaped by natural selection for language.
正如引言中所指出的,音乐是人类普遍存在的事实并不能证明自然选择在音乐进化中的直接作用。然而,音乐的普遍性,加上它缺乏明显的生存价值,自达尔文以来一直困扰着进化论思想家。在人类的由来(1871 年),达尔文评论说,我们的音乐能力“必须被列为 [人类] 被赋予的最神秘的能力之一。” 自达尔文时代以来,人们对音乐和进化的态度遵循两个主要方向。一种思路将音乐视为其他认知技能的令人愉快的副产品。例如,William James 说,对音乐的热爱“只是神经系统的一种偶然特性,没有目的论意义”(引自 Langer,1942:210)。最近,Pinker (1997) 和其他人进一步发展了这一主题。相比之下,另一种思路倾向于认为我们的音乐能力是自然选择的直接目标(例如,Wallin 等等人,2000 年;巴尔特,2004 年;Mithen, 2005),这是达尔文 (1871) 最先提出的一个想法。对后一种想法的热情迅速蔓延,关于音乐在人类进化中可能的适应性作用的假设也越来越多。下面的第 7.3.1 节将回顾其中的一些。正如我们将看到的,每个假设都有其问题,音乐的原始适应性价值(如果有的话)很可能永远笼罩在神秘之中。
As noted in the introduction, the fact that music is a human universal is not evidence for a direct role of natural selection in the evolution of music. Nevertheless, music’s universality, combined with its lack of an obvious survival value, has puzzled evolutionary thinkers since Darwin. In The Descent of Man (1871), Darwin remarked that our musical abilities “must be ranked among the most mysterious with which [humans are] endowed.” Since Darwin’s time, attitudes toward music and evolution have followed two main courses. One line of thought regards music as enjoyable byproduct of other cognitive skills. For example, William James said that a love of music was “a mere incidental peculiarity of the nervous system, with no teleological significance” (cited in Langer, 1942:210). This theme has been further developed in recent times by Pinker (1997) and others. In contrast, another line of thought favors the view that our musical abilities were a direct target of natural selection (e.g., Wallin et al., 2000; Balter, 2004; Mithen, 2005), an idea first proposed by Darwin (1871). Enthusiasm for this latter idea has spread rapidly, and there are a growing number of hypotheses about the possible adaptive roles of music in human evolution. A few of these are reviewed in section 7.3.1 below. As we shall see, each hypothesis has its problems, and it is likely that the original adaptive value of music (if any) will always be shrouded in mystery.
因此,一种更有前途的音乐和进化研究方法是从适应论者的猜想转向关注可以在生物体中进行实证研究的因素(参见 Fitch,2006)。例如,是否有证据表明人类进入这个世界是专门为学习音乐结构做准备的?是否存在因音乐选择而塑造的基因?非人类动物是否能够获得基本的音乐能力?上面第 7.2 节中审查的证据背后有类似的语言问题。最终,这些问题的答案将决定是否应该拒绝零假设(即,在音乐的情况下,人类没有被进化特别塑造为具有音乐性的假设)。
Thus a more promising approach to the study of music and evolution is to shift away from adaptationist conjectures to focus on factors that can be empirically studied in living organisms (cf. Fitch, 2006). For example, is there evidence that humans enter the world specifically prepared to learn about musical structures? Are there genes that have been shaped by selection for music? Are nonhuman animals capable of acquiring basic musical abilities? Analogous questions for language were behind the evidence reviewed in section 7.2 above. Ultimately, it is answers to these sorts of questions that will decide if the null hypothesis should be rejected (i.e., in the case of music, the hypothesis that humans have not been specifically shaped by evolution to be musical).
本节的目的是检验自然选择在直接塑造人类音乐能力方面的作用的证据。为此,首先根据为语言提供的 10 行证据(在上文第 7.2 节中)检查音乐。在此之后,考虑来自发育、遗传学和动物研究的其他数据(在第7.3.3、7.3.4和7.3.5节中)。然而,在开始这项调查之前,有必要描述一些关于音乐的适应主义假设,以了解音乐在人类进化中的一些拟议功能。
The purpose of this section is to examine the evidence for natural selections’ role in directly shaping human musical abilities. To this end, music is first examined with regard to the 10 lines of evidence that were presented for language (in section 7.2 above). Following this, additional data from development, genetics, and animal studies are considered (in sections 7.3.3, 7.3.4, and 7.3.5). Prior to embarking on this survey however, it is worth describing a few adaptationist hypotheses about music, to give a flavor of some proposed functions of music in human evolution.
Miller (2000) 采纳了达尔文 (1871) 提出的观点,认为音乐类似于鸟鸣,因为它是性选择的产物(选择能够增强成功竞争配偶能力的特征)。米勒将音乐视为一种心理适应,它允许雄性进行复杂的求爱表演,作为接近雌性的竞争的一部分。据推测,这些展示向女性传达了各种理想的心理和身体能力。为了支持他的假设,米勒声称人类在青春期表现出对音乐的兴趣达到顶峰,并且男性音乐家在这个年龄段创作的音乐(以录音输出衡量)远多于女性音乐家。
Miller (2000), taking up an idea proposed by Darwin (1871), has argued that music is analogous to bird song in that it is a product of sexual selection (selection for traits that enhance the ability to successfully compete for mates). Miller sees music as a psychological adaptation that permitted males to perform complex courtship displays as part of competition for access to females. Presumably these displays conveyed various desirable mental and physical abilities to females. In support of his hypothesis, Miller claims that humans show a peak in their musical interests in adolescence, and that male musicians produce far more music (as measured by recorded output) than female musicians at this age.
这个假设有很多困难。例如,雄性歌曲产生的性别选择通常会导致显着的雌雄解剖结构和/或行为差异(例如,在鸟类、鲸鱼、青蛙、蟋蟀之间),但没有证据表明人类男性和女性在任何实质性方面存在差异入口他们的音乐制作或感知能力 (Huron, 2003)。另一个问题是,人类音乐在文化内部和跨文化中发挥着许多作用,包括治愈、哀悼、庆祝、记忆等等 (Cross, 2003)。求爱只是音乐的众多功能之一,没有证据表明音乐对于跨人类文化的成功求爱是必要的或充分的。这个想法的最后一个困难是,米勒作为性选择证据引用的社会模式可以用文化因素来解释,例如音乐在青春期身份形成中的重要性(参见第 6 章)和男性在控制性行为方面的主导地位。唱片业。
This hypothesis has numerous difficulties. For example, sexual selection for male song production typically results in salient male-female differences in anatomy and/or behavior (e.g., among birds, whales, frogs, crickets), yet there is no evidence that human males and females differ in any substantial way in their music production or perception abilities (Huron, 2003). Another problem is that human music serves many roles both within and across cultures, including healing, mourning, celebrating, memorizing, and so forth (Cross, 2003). Courtship is only one of music’s many functions, and there is no evidence that music is either necessary or sufficient for successful courtship across human cultures. A final difficulty with the idea is that the social patterns adduced by Miller as evidence for sexual selection can be explained by cultural factors, such as the importance of music in identity formation in adolescence (cf. Chapter 6) and male dominance in the control of the recording industry.
Cross (2003) 阐述了关于音乐适应性价值的一个非常不同的观点。克罗斯认为,音乐在智力发展中起着重要作用,因为它锻炼和整合了多种认知和运动能力,并为探索社会行为提供了一个安全的媒介。换句话说,通过促进认知灵活性和社交能力,音乐有助于心智的发展。尽管这个想法很吸引人,但它也遇到了困难。如果音乐是智力发展的重要催化剂,那么人们会认为有先天性音乐缺陷的人会有可察觉的问题或认知或社交能力的延迟。然而,情况似乎并非如此。据估计,大约 4% 的人口是“五音不全”,换句话说,第 4章4.5.2 节). 这种情况可能是由影响听觉系统的基因引起的 (Kalmus & Fry, 1980; Drayna et al., 2001)。目前,没有证据表明音盲个体患有任何严重的非音乐认知缺陷或延迟,或表现出任何类型的异常社会化(Ayotte 等人,2002 年)。事实上,五音不全的人可以算出他们的行列中有许多知识和社会杰出人物,包括米尔顿弗里德曼(诺贝尔经济学奖获得者)和切格瓦拉(具有超凡魅力的拉丁革命者)。因此,虽然音乐可以丰富心智的发展,但它似乎并不是正常心智发展所必需的。这与语言形成对比,因为最近对尼加拉瓜新兴语言的研究表明,语言对于某些基本认知技能的发展是必不可少的。其中一项技能是“错误信念理解”,换句话说,能够认识到自己的想法和信念可能与他人不同,和/或错误(Pyers,2006,cf.第 7.2.8 节)。
A very different idea about music’s adaptive value has been articulated by Cross (2003). Cross suggests that music plays an important role in mental development, in that it exercises and integrates a variety of cognitive and motor abilities and provides a safe medium for the exploration of social behavior. In other words, by promoting cognitive flexibility and social facility, music aids in the development of mind. Although this idea is appealing, it, too, encounters difficulties. If music were an important catalyst in mental development, then one would expect that individuals with congenital musical deficits would have detectable problems or delays in cognitive or social abilities. Yet this does not appear to be the case. About 4% of the population is estimated to be “tone-deaf,” in other words, to have severe lifelong difficulties with music perception (cf. Chapter 4, section 4.5.2). This condition may result from genes that affect the auditory system (Kalmus & Fry, 1980; Drayna et al., 2001). At the current time, there is no evidence that tone-deaf individuals suffer from any serious nonmusical cognitive deficits or delays, or exhibit abnormal socialization of any sort (Ayotte et al., 2002). In fact, the tone-deaf can count numerous intellectually and socially prominent individuals among their ranks, including Milton Friedman (the Nobel Prize-winning economist) and Che Guevara (the charismatic Latin revolutionary). Thus although music may enrich the development of mind, it is does not appear to be necessary for normal mental development. This stands in contrast to language, because recent studies of an emerging language in Nicaragua suggest that language is necessary for the development of certain basic cognitive skills. One such skill is “false belief understanding,” in other words, the ability to recognize that one’s own thoughts and beliefs can be different from those of others, and/or mistaken (Pyers, 2006, cf. Section 7.2.8).
这里讨论的最后一个假设是关于音乐在促进群体成员之间的社会凝聚力方面的作用。在众多不同的改编主义者对音乐的猜想,这是听得最多的,也是最受支持的。根据这一假设,音乐通过其在仪式和群体音乐制作中的作用,有助于巩固祖先人类群体成员之间的社会联系。这个想法的直观吸引力是显而易见的。首先,当代小规模文化中的大量音乐活动似乎确实是社会性的(Morley,2003)。其次,众所周知,音乐可以成为调节情绪的强大力量(Sloboda 和 O'Neill,2001 年),这表明集体创作音乐可以产生共同的情绪状态。第三,共同的情绪状态似乎有可能增强个体之间联系的主观感觉。
The final hypothesis discussed here concerns the role of music in promoting social cohesion among members of a group. Among the many different adaptationist conjectures about music, this is the most often heard and the one that attracts the most support. According to this hypothesis, music helped cement social bonds between members of ancestral human groups via its role in ritual and in group music making. The intuitive appeal of this idea is obvious. First, it does seem that a great deal of musical activity in contemporary small-scale cultures is social (Morley, 2003). Second, it is known that music can be a powerful force in mood regulation (Sloboda & O’Neill, 2001), suggesting that group music making could result in a shared mood state. Third, it seems plausible that a common mood state would enhance the subjective sense of a bond between individuals.
社会凝聚力假说以各种形式出现。Dunbar (2003) 提出的一种变体是集体唱歌导致内啡肽释放,从而模仿灵长类动物身体梳理的神经效应 (cf. Merker, 2000)。(这构成了 Dunbar 更大论点的一部分,即语言的进化主要是为了传达社会信息,从而使我们的原始人祖先能够通过口头交流用“远距离梳理”代替身体修饰,从而实现更大的群体规模。Dunbar 认为集体歌唱先于语言的进化。)据推测,这种内啡肽的释放会对社会行为产生一些积极影响,而这反过来又会在提高繁殖成功率方面产生回报(参见 Silk 等人,2003 年,有证据表明两者之间存在联系灵长类动物的社交能力和繁殖成功)。
The social cohesion hypothesis occurs in various forms. One variant, proposed by Dunbar (2003), is that group singing resulted in endorphin release, thus mimicking the neural effects of physical grooming in primates (cf. Merker, 2000). (This forms part of Dunbar’s larger argument that language evolved primarily to convey social information, thus allowing our hominid ancestors to replace physical grooming with “grooming at a distance” by means of verbal communication, and thus to achieve larger group size. Dunbar argues that group singing preceded the evolution of language.) Presumably this endorphin release would have some positive effect on social behavior, which in turn would have a payoff in terms of increased reproductive success (cf. Silk et al., 2003 for evidence of a link between sociability and reproductive success in primates).
社会凝聚力假说的另一种变体侧重于母亲和婴儿之间的纽带。Trehub (2000, 2003a) 和 Dissanayake (2000) 等人已经注意到音乐在母婴关系中的作用。Trehub 的工作以其实证基础而著称。她和她的同事对多种文化的摇篮曲进行了广泛的研究,确定了共同的结构特征,例如简单的结构和下降的轮廓,这可能有助于安抚婴儿(例如,Unyk 等人,1992 年;Trehub 等人,1993 年; Trehub 和 Trainor,1998 年)。Trehub 及其同事还表明,母亲的歌声对婴儿有吸引力:6 个月大的婴儿在视听演示中看自己母亲表演婴儿指导的歌声与婴儿指导的演讲的时间更长(Nakata 和 Trehub,2004 年)。此外,该研究小组在母亲对他们说话或唱歌前后检查了婴儿唾液中的皮质醇(唤醒水平的指标)。皮质醇水平在两种类型的刺激后显着下降,但在针对婴儿的歌曲后比针对婴儿的语言后皮质醇水平保持低水平的时间更长(Shenfield 等人,2003 年)。基于这些和其他数据,Trehub 认为母性歌唱具有适应性功能,允许女性人类安抚婴儿而不必抚摸他们(见 Balter,2004 年)。这对我们的祖先来说可能是一个优势,因为人类婴儿(与其他婴儿灵长类动物不同)不能在母亲用双手觅食时紧贴母亲的皮毛。唱歌可以让母亲在觅食时放下婴儿,同时还能安抚他们(参见 Falk,2004a,b)。
Another variant of the social cohesion hypothesis focuses on the bond between mothers and infants. Trehub (2000, 2003a) and Dissanayake (2000), among others, have drawn attention to the role of music in mother-infant bonding. Trehub’s work is especially notable for its empirical basis. She and her colleagues have done extensive research on lullabies across numerous cultures, identifying shared structural features such as simple structure and falling contours, which might serve to soothe the infant (e.g., Unyk et al., 1992; Trehub et al., 1993; Trehub & Trainor, 1998). Trehub and colleagues have also shown that maternal singing is attractive to infants: 6-month-olds look longer at audiovisual presentations of their own mother performing infant-directed singing versus infant-directed speech (Nakata & Trehub, 2004). In addition, this research team has examined cortisol in infant saliva (an indicator of arousal levels) before and after mothers spoke or sang to them. Cortisol levels dropped significantly after both types of stimuli, but stayed low for a longer period after infant-directed song than after infant-directed speech (Shenfield et al., 2003). Based on these and other data, Trehub has argued that maternal singing has an adaptive function, allowing female humans to soothe their babies without necessarily having to touch them (see Balter, 2004). This may have been an advantage to our ancestors because infant humans (unlike other infant primates) cannot cling to their mother’s fur while the mother forages with both hands. Singing would have allowed mothers to put their infants down while foraging and still soothe them (cf. Falk, 2004a, b).
尽管社会凝聚力假说很有吸引力,但它也面临着许多挑战。首先,音乐主要是小规模文化中的一种社会活动这一前提值得严格审查。这可能是最明显的这些文化中的音乐是社交的,但也可能有大量的“隐蔽音乐”,尤其是歌曲。我所说的隐蔽音乐是指为特定听众(甚至可能只为自己)谨慎创作和表演的音乐,因为它代表了爱、记忆、失落等的亲密陈述。在巴布亚新几内亚中部的 Wopkaimin 部落度过了一个夏天,他们过着石器时代的猎人和园艺生活方式,这让我意识到一个事实,即私人的、隐蔽的音乐的数量可以大大超过社会的、公开的音乐的数量。小规模文化。
Although the social cohesion hypothesis is appealing, it, too, faces a number of challenges. First, the premise that music is primarily a social activity in small-scale cultures deserves critical scrutiny. It may be that most of the obvious music in these cultures is social, but there also may be a great deal of “covert music” as well, especially songs. By covert music, I mean music composed and performed discreetly for a very select audience (perhaps even only for oneself), because it represents an intimate statement of love, memory, loss, and so forth. A summer spent among the Wopkaimin of central Papua New Guinea, a tribal people who lived a stone-age hunter-horticulturalist lifestyle, alerted me to the fact that the amount of private, covert music can greatly outweigh the amount of social, overt music in small-scale cultures.
社会凝聚力假说的一个更严重的困难涉及其含义之一。如果音乐是一种促进社会联系的适应,那么人们会预测基于生物学的社会障碍会削弱对音乐的反应。然而,在社会认知方面有明显缺陷的自闭症儿童对音乐影响很敏感(Heaton 等人,出版中;参见 Allen 等人,已提交)。此外,鉴于他们有限的语言和其他智力能力,自闭症患者有时会获得非凡的音乐能力 (Rimland & Fein, 1988; Miller, 1989)。因此,音乐似乎与社会行为中涉及的大脑机制没有必然关系,正如人们可能从社会凝聚力假说中预测的那样。
A more serious difficulty for the social cohesion hypothesis concerns one of its implications. If music were an adaptation to promote social bonding, then one would predict that biologically based social impairments would curtail responsiveness to music. Yet autistic children, who have pronounced deficits in social cognition, are sensitive to musical affect (Heaton et al., in press; cf. Allen et al., submitted). Furthermore, autistic individuals sometimes achieve musical abilities that are remarkable given their limited linguistic and other intellectual abilities (Rimland & Fein, 1988; Miller, 1989). Music thus does not appear to have an obligatory relationship to brain mechanisms involved in social behavior, as one might predict from the social cohesion hypothesis.
最后,关于社会凝聚力假说的母婴版本,人们可能会注意到母亲有很多方法可以安抚婴儿并与他们建立联系,而音乐互动(尽管母婴都喜欢)可能不是必要的发生正常的粘合或舒缓。这个假设的关键弱点是目前没有数据表明正常的社交或情感发展需要唱歌。(有兴趣研究这个问题的人可以研究有爱心的母亲不唱歌的健听婴儿的社会发展,也许包括聋人成人的健听儿童,或“CODAs”。)10
Finally, with regard to the mother-infant version of the social cohesion hypothesis, one may note that mothers have many ways of soothing their infants and bonding with them, and that musical interactions (although pleasing to mother and infant) may not be necessary for normal bonding or soothing to occur. The crucial weakness of this hypothesis is that there are at present no data to suggest that singing is needed for normal social or emotional development. (Those interested in studying this question could examine the social development of hearing infants with loving mothers who do not sing, perhaps including hearing children of deaf adults, or “CODAs.”)10
关于自然选择在语言进化中的作用的 10 条证据(在第 7.2 节中回顾),音乐的表现如何?其中一些证据似乎同样适用于音乐。例如,牙牙学语、声乐学习和声道解剖学都可以反映出对最初支持语言和声乐的声学通信系统的适应(参见 Darwin,1871 年;Mithen,2005 年)。因此,这是模棱两可的域(音乐或语言)为人类生物学的这些特征提供了相关的选择压力。相关基因的固定(FOXP2;第 7.2.9 节)在选择压力的来源方面也是模棱两可的,因为该基因似乎涉及支持语音(发音和句法)和音乐节奏的电路(参见第7.3.4 节)。关于增加贫乏输入的复杂性(第 7.2.8 节),据我所知,没有相关数据可以说明这个问题。幸运的是,有关于其余证据的相关数据,如下所述。
How does music fare with regard to the 10 lines of evidence for natural selection’s role in language evolution (reviewed in section 7.2)? Some of these lines of evidence would seem to apply equally well to music. For example, babbling, vocal learning, and the anatomy of the vocal tract could all reflect adaptations for an acoustic communication system that originally supported both language and vocal music (cf. Darwin, 1871; Mithen, 2005). It is thus ambiguous which domain (music or language) provided the relevant selective pressures for these features of human biology. Fixation of a relevant gene (FOXP2; section 7.2.9) is also ambiguous with regard to the source of selection pressure, because this gene appears to be involved in circuits that support both speech (articulation and syntax) and musical rhythm (cf. section 7.3.4 below). With regard to adding complexity to impoverished input (section 7.2.8), there are to my knowledge no relevant data to speak to this issue. Fortunately, there are relevant data with respect to the remaining lines of evidence, as reviewed below.
语言技能发展得非常快。甚至在生命的第一年结束之前,婴儿就会发生感知变化,使他们适应母语中相关的语音对比,到 3 或 4 岁时,他们会产生流利语音的高度复杂的语音和语法模式. 在音乐声音模式的感知或产生方面,人类是否也表现出早熟的能力?
Linguistic skills develop remarkably quickly. Even before the end of the first year of life, infants undergo perceptual changes that attune them to the relevant phonological contrasts in their native language, and by the age of 3 or 4, they are producing the highly complex phonological and grammatical patterns of fluent speech. Do humans also show precocious abilities when it comes to the perception or production of musical sound patterns?
可以通过检查音乐认知的一个重要方面的发展来解决上述问题:对音调关系的敏感性。这是指在音阶、音调中心等方面对音高的有组织使用的敏感性(参见第 5 章). (我在这里重点关注西欧音乐,它比任何其他传统都成为更多发展研究的焦点。不过,应该指出的是,有组织的音阶和调性中心的使用在人类音乐中很普遍。)对音调敏感度的一种非常基本的形式是对音符序列中关键成员的敏感度。没有受过音乐训练的普通成年人可以很容易地发现旋律中的“酸音”,换句话说,即使它们在任何物理意义上都不是奇怪的,但违反当地调的音符(事实上,未能检测到酸音是音乐的一个症状成人“音聋”;参见 Ayotte 等人,2002 年)。
One can address the question above by examining the development of an important aspect of musical cognition: sensitivity to tonality relations. This refers to sensitivity to the organized use of pitches in terms of scales, a tonal center, and so forth (cf. Chapter 5). (I focus here on Western European music, which has been the focus of more developmental research than any other tradition. It should be noted, though, that organized scales and the use of a tonal center are widespread in human music.). A very basic form of sensitivity to tonality is sensitivity to key membership in a sequence of notes. Normal adults with no musical training can readily spot “sour” notes in a melody, in other words, notes that violate the local key even though they are not odd in any physical sense (indeed, failure to detect sour notes is one symptom of musical “tone deafness” in adults; cf. Ayotte et al., 2002).
目前的证据表明,人类对关键成员的敏感性发展得相当缓慢。Trainor 和 Trehub (1992) 测试了婴儿和成人检测背景中重复的短音调旋律变化的能力。11名婴儿通过在“变化试验”中转向扬声器来表明他们检测到变化,为此他们通过动画玩具得到强化。一半的变化试验包含一个音符 4 个半音的变化,导致一个新的音调仍然是调性的。在另一半的变化试验中,同一个音符仅变化了 1 个半音,导致音符走调。因此,该设计巧妙地将较大的物理变化与较小的物理变化进行了对比违反习得的音调模式的变化。成人比调内变化更好地检测到调外变化,而婴儿检测到两种类型的变化同样好(尽管他们的绝对表现水平低于成人)。这提供了证据表明,关键成员的内隐知识在 8 个月大时还没有到位。
Current evidence suggests that sensitivity to key membership develops rather slowly in humans. Trainor and Trehub (1992) tested the ability of infants and adults to detect a change in a short, tonal melody that was repeated in the background.11 Infants indicated their detection of a change by turning toward the loudspeaker on “change trials,” for which they were reinforced by an animated toy. Half of the change trials contained a change of 4 semitones to one note, resulting in a new pitch that was still in key. On the other half of the change trials, the same note was changed by just 1 semitone, resulting in an out-of-key note. Thus the design cleverly pitted a large physical change against a smaller change that violated a learned tonal schema. The adults detected the out-of-key change better than the in-key change, whereas the infants detected both types of changes equally well (though their absolute level of performance was lower than the adults). This provided evidence that implicit knowledge of key membership is not in place by 8 months of age.
在类似设计的后续研究中,Trainor 和 Trehub (1994) 表明,5 岁的儿童更善于检测旋律的外调变化而不是内调变化,尽管儿童的辨别能力仍然只有大约是成人的一半(参见 Costa-Giomi,2003 年)。因此,在 8 个月到 5 岁之间的某个时间,儿童会形成一种关键成员的意识,尽管这种意识仍然不如成人强烈(参见 Davidson 等人,1981 年)。遗憾的是,对于这种能力在 8 个月到 5 年之间的发展时间进程的研究很少。Dowling (1988) 及其同事进行了一项实验,让 3 至 5 岁的儿童听到与平均音程大小等统计特性相匹配的调性和无调性旋律(这里的“调性”是指所有音符都符合一个键)。任务是确定哪些旋律具有“正常音符”,哪些具有“奇怪音符”。根据孩子们自发唱歌的合调程度,孩子们被分成两组。道林等人。发现唱得更合调的 3 岁儿童几乎无法区分调性旋律和无调性旋律(正确率为 58%,其中偶然表演正确率为 50%)。唱得比较走调的三岁儿童并没有辨别旋律。
In a follow-up study of similar design, Trainor and Trehub (1994) showed that 5-year-olds were better at detecting out-of-key changes than in-key changes to melodies, though the children’s discrimination performance was still only about half as good as adults’ (cf. Costa-Giomi, 2003). Thus somewhere between 8 months and 5 years, children develop a sense of key membership, though the sense is still not as strong as in adults (cf. Davidson et al., 1981). Unfortunately, there is a paucity of research on the developmental time course of this ability between 8 months and 5 years. Dowling (1988) and colleagues conducted an experiment in which 3- to 5-year-old children heard tonal and atonal melodies matched for statistical properties such as average interval size (here “tonal” means all notes conformed to a single key). The task was to identify which melodies had “normal notes” and which had “odd notes.” The children were divided into two groups based on how in tune their own spontaneous singing was. Dowling et al. found that 3-year-olds who sang more in tune could just barely discriminate tonal from atonal melodies (58% correct, in which chance performance was 50% correct). Three-year-olds who sang more out of tune did not discriminate the melodies.
显然,需要做更多的工作来研究 8 个月到 5 岁之间对关键成员的敏感性的发展。ERP 等神经技术特别有希望用于此目的,因为它们不需要孩子对音调做出明显的决定(参见 Koelsch 等人,2003 年)。然而,相关的神经工作尚未完成。现有的行为数据表明,儿童对关键成员的敏感性发展相当缓慢。这与语言发展形成鲜明对比,因为普通的 3 岁和 4 岁儿童擅长感知语言中复杂的语音和语法结构。音高感知的这种较慢发展可能是儿童歌唱通常“跑调”直到大约 5 岁的原因之一 (Dowling, 1988)。如果音高能力是自然选择的目标,那么人们会期望准确感知和产生音高模式的学习速度比观察到的速度快得多。考虑到儿童听到的音乐(例如,儿歌)的音调结构特别强烈,几乎没有走调的音符,这种缓慢的发展尤其引人注目(Dowling,1988)。
Clearly, more work is warranted to study the development of sensitivity to key membership between 8 months and 5 years. Neural techniques such as ERPs are particularly promising for this purpose, because they do not require the child to make overt decisions about tonality (cf. Koelsch et al., 2003). The relevant neural work has yet to be done, however. The existing behavioral data suggests that sensitivity to key membership develops rather slowly in children. This stands in sharp contrast to language development, because ordinary 3- and 4-year-olds are adept at perceiving complex phonological and grammatical structures in language. This slower development of musical pitch perception may be one reason why children’s singing is typically “out of tune” until about 5 years of age (Dowling, 1988). If musical pitch abilities had been the target of natural selection, one would expect accurate perception and production of musical pitch patterns to be learned far more quickly than the observed rate. The slow development is especially striking given that the music children hear (e.g., nursery songs) is especially strong in its tonal structure, with few out-of-key notes (Dowling, 1988).
有人可能会反对,对关键成员的敏感性实际上是音乐的一个相当复杂的方面,而且许多其他与音乐相关的能力实际上在小婴儿身上很明显,例如对旋律轮廓的敏感性以及区分辅音和不协和音的能力。然而,这些敏感性可能是语音调谐或一般听觉处理原理的副产品(我在第 7.3.3 节中回到这一点,小节关于音乐和婴儿期)。相反,对关键成员的敏感性与非音乐听觉功能没有任何明显关系,因此是对音乐能力发展的更具体测试。除非有充分的证据表明音乐技能的快速发展与语言或听觉功能的一般原则无关,否则没有理由拒绝音乐不是自然选择的直接目标这一零假设。
One might object that sensitivity to key membership is actually quite a complex aspect of music, and that many other musically relevant abilities are in fact evident in young infants, such as sensitivity to melodic contour and an ability to discriminate consonant from dissonant sounds. However, it may be that these sensitivities are byproducts of attunement to speech or of general auditory processing principles (I return to this point in section 7.3.3, subsections on music and infancy). In contrast, sensitivity to key membership does not have any obvious relationship to nonmusical auditory functions, and is thus a more specific test of the development of musical abilities. Until good evidence appears showing rapid development of musical skills that are not related to language or to general principles of auditory function, there is no reason to reject the null hypothesis that music has not been a direct target of natural selection.
如第 7.2.5 节所述,关键期或敏感期是发育过程对环境输入特别敏感的时间窗口。这段时间的投入(或缺乏投入)会对特定领域的成人能力的最终水平产生深远影响。
As noted in section 7.2.5, a critical period or sensitive period is a time window when developmental processes are especially sensitive to environmental input. Input (or lack of it) during this time can have a profound effect on the ultimate level of adult ability in specific areas.
音乐技能的习得是否有关键期?研究这一点的一种方法是对开始学习音乐的人的音乐能力进行研究,这些人在不同年龄开始学习音乐,但他们的总训练年数是相匹配的。此类研究将与 Newport 及其同事进行的第二语言学习研究完全平行(参见第 7.2.5 节)。不幸的是,此类工作尚未完成(Trainor,2005)。如果此类研究表明早起学习者对音乐音韵学的敏感性更高(例如,音程或和弦等声音元素的感知分类;参见第 2 章) 或音乐句法,这将有利于关键时期假设。然而,即使不做这些实验,似乎即使发现了音乐的关键时期效应,但与语言相比,这种效应也会很弱。怀疑这一点的一个原因是,一些非常有成就的音乐家在 10 岁之后并没有开始演奏他们的乐器。(仅举一个例子,乔治格什温在 13 岁时开始接触钢琴。)相比之下,一个没有语言输出的孩子直到这个年龄都不可能达到甚至正常的能力水平。
Is there a critical period for the acquisition of musical skills? One way to study this would be to conduct studies of musical ability in individuals who started to learn music at different ages, but who were matched for overall number of years of training. Such studies would be entirely parallel to the studies of second-language learning conducted by Newport and colleagues (cf. section 7.2.5.) Unfortunately such work has yet to be done (Trainor, 2005). If such studies indicate that early-onset learners are superior in their sensitivity to musical phonology (e.g., the perceptual categorization of sound elements such as intervals or chords; cf. Chapter 2) or musical syntax, this would favor a critical period hypothesis. However, even without doing these experiments, it seems that even if a critical period effect for music is found, the effect will be rather weak compared to language. One reason to suspect this is that some highly accomplished musicians did not start playing their instrument after the age of 10. (To take just one example, George Gershwin was introduced to the piano at 13.) In contrast, a child with no linguistic output until this age is unlikely to ever reach even a normal level of ability.
一些早期音乐体验对大脑产生不成比例影响的证据来自现代结构神经影像学,它允许人们研究健康个体的神经解剖学。Bengtsson 等人。(2005) 使用一种特殊类型的磁共振成像来检查专业钢琴家的白质通路。12(白质指的是连接大脑不同部分的神经纤维。)他们发现,练习量与特定神经区域的白质量密切相关,与童年(相对于青少年或成人)的练习量密切相关在某些地区最强的预测能力。因此,例如,儿童时期的练习量与内囊后肢中的白质数量密切相关,后者携带来自大脑的下行纤维。运动皮层到脊髓,这对独立的手指运动很重要。也就是说,尽管童年时期的练习时间明显少于青春期或成年期,但与青春期或成年期的练习量相比,童年时期的练习量更能预测该区域的成人神经解剖学。
Some evidence for a disproportionate influence of early musical experience on the brain comes from modern structural neuroimaging, which allows one to study the neuroanatomy of healthy individuals. Bengtsson et al. (2005) used a special type of magnetic resonance imaging to examine white matter pathways in professional pianists.12 (White matter refers to neural fibers that connect different parts of the brain.) They found that the amount of practice strongly correlated with the amount of white matter in specific neural regions, with the amount of childhood (vs. adolescent or adult) practice having the strongest predictive power in certain regions. Thus, for example, the amount of childhood practice correlated strongly with the amount of white matter in the posterior limb of the internal capsule, which carries descending fibers from the motor cortex to the spinal cord, and which is important for independent finger movement. That is, even though there were significantly fewer practice hours in childhood than in adolescence or adulthood, the amount of practice during childhood was more predictive of the adult neuroanatomy of this region than the amount of practice during adolescence or adulthood.
虽然这项研究表明早期的音乐体验会对运动系统的神经解剖学产生不成比例的影响,但它并没有提供与音乐相关的认知能力关键时期的证据。后一个问题与进化问题更相关(例如,回顾第 7.2.5 节中语言的关键时期如何影响音系和句法)。然而,音乐认知的一个方面表现出强烈的关键时期效应,即绝对音高 (AP)。(不熟悉 AP 的人请参考第 7.3.4 节, 关于绝对音高的小节,用于介绍。)有充分的证据表明,音乐家 AP 的流行程度在很大程度上取决于音乐训练开始的年龄。具体来说,如果个人在 6 岁之后开始接受训练,则很少有这种能力(Levitin & Zatorre,2003)。这是一个具有启发性的发现,因为语言习得的关键时期也被认为是在 6 或 7 岁左右。然而,AP 是一种罕见的特征,估计每 10,000 人中就有 1 人出现,这意味着 AP 对于正常音乐能力的发展不是必需的。此外,没有证据表明具有 AP 的人总体上比非 AP 音乐家更有音乐天赋(例如,在作曲、创造力方面)。因此,AP 的关键期并不是音乐习得关键期的证据。
Although this study indicates that early musical experience can have a disproportionate impact on the neuroanatomy of the motor system, it does not provide evidence of a critical period for cognitive abilities related to music. It is this latter issue that is more relevant for evolutionary issues (e.g., recall from section 7.2.5 how critical periods in language influenced phonology and syntax). There is, however, one aspect of musical cognition that shows a strong critical period effect, namely absolute pitch (AP). (Those unfamiliar with AP should consult section 7.3.4, subsection on absolute pitch, for an introduction.) It has been well documented that the prevalence of AP in musicians depends strongly on the age of onset of musical training. Specifically, it is very rare for individuals to have this ability if they began training after age 6 (Levitin & Zatorre, 2003). This is a provocative finding, given that the critical period for language acquisition is also thought to be around 6 or 7 years of age. However, AP is a rare trait, estimated to occur in 1 in 10,000 people, which means that AP is not necessary for the development of normal musical abilities. Furthermore, there is no evidence to suggest that people with AP are more musically gifted overall than non-AP musicians (e.g., in composition, creativity). Thus the critical period for AP is not evidence for a critical period for music acquisition. (See Trainor 2005 for a concise review of AP from a developmental perspective.)
总而言之,目前还没有很好的证据表明在获得音乐认知能力(例如,对音调句法的敏感性)的关键时期。在产生这样的证据之前,音乐进化的原假设不会受到挑战。
To summarize, at the current time, there is no good evidence for a critical period in the acquisition of musical cognitive abilities (e.g., sensitivity to tonal syntax). Until such evidence is produced, the null hypothesis for the evolution of music is not challenged.
人类的语言能力远比他们的音乐能力统一。尽管一些正常人肯定比其他人更流利或更敏锐,但与正常人的音乐能力范围相比,这些差异似乎很小。当然,需要有关这两个领域内能力变化的经验数据才能进行公平比较。理想情况下,此类数据应包括旨在挖掘内隐技能的测试,其中音乐的外显训练不是混杂变量(Bigand,2003 年;Bigand 等人,2006 年)。
Humans appear far more uniform in their linguistic than in their musical abilities. Although some normal people are certainly more fluent speakers or have keener ears for speech than do others, these variations seem minor compared to the range of musical abilities in normal people. Of course, empirical data on variation in abilities within both domains is needed to make a fair comparison. Ideally, such data would include tests designed to tap into implicit skills, in which explicit training in music is not a confounding variable (Bigand, 2003; Bigand et al., 2006).
尽管尚未完成此类实证比较研究,但现有数据确实表明音乐能力存在很大差异。仅举一个例子,考虑与音乐节拍同步敲击的能力。这种基本能力似乎很普遍,但实际上很少有人估计它有多普遍在随机选择的非音乐家中。Drake、Penel 和 Bigand (2000) 测试了 18 名从未上过音乐课或演奏过乐器的人。参与者被要求一次听一首钢琴曲,然后在第二次聆听时与感知到的节拍同步敲击,受过音乐训练的参与者发现这项任务非常容易。德雷克等人。在同步时使用了一个非常宽松的成功衡量标准:所需要的只是 10 次连续的敲击,这些敲击落在理想化节拍位置周围的宽窗口内。即使有这些宽松的标准,他们发现非音乐家无法同步超过 10% 的试验。因此,敲击节拍的能力似乎并不像人们预期的那样普遍,
Although such empirical comparative research has yet to be done, existing data do suggest a good deal of variability in musical abilities. To take just one example, consider the ability to tap in synchrony with a musical beat. This basic ability appears widespread, but in fact there are very few estimates for how common it is among randomly selected nonmusicians. Drake, Penel, and Bigand (2000) tested 18 individuals who had never taken music lessons or played an instrument. The participants were asked to listen to one piano piece at a time, and then tap in synchrony with the perceived beat on a second listening, a task that musically trained participants found very easy. Drake et al. used a very lenient measure of success at synchronization: All that was required was 10 consecutive taps that fell within a broad window surrounding the idealized beat location. Even with these lax criteria, they found that nonmusicians were unable to synchronize on over 10% of the trials. Thus it appears that the ability to tap to a beat may not be as universal as one might expect, though empirical data are needed to ascertain what percent of musically untrained individuals have this basic ability.
音乐发展不如语言发展强劲的另一个指标与音乐的形态独特性有关。人类语言可以将模态从基于声音的系统跳跃到无声的、基于符号的系统这一事实证明了语言习得机制的稳健性。如果音乐习得同样强大,人们可能会期望聋人社区会创造他们自己的“符号音乐”,换句话说,一种非参考但组织丰富的视觉符号系统,具有离散元素和句法原则,为审美目的创建和共享一个感恩的社区。但是,不存在这样的系统(K. Emmorey,个人通信)。一些聋人确实通过一些残余听力功能享受原声音乐,并且由于节奏提供的触觉,但这与基于符号的音乐形式不同。当然,有人可能会争辩说,如果没有基本的听觉区别,例如协和和不协和,音乐就无法“脱颖而出”。然而,这样的论点具有事后的味道。原则上,由协调和不协调产生的对立类型(例如,粗糙感与平滑感)可以在视觉符号中具有标志性的相似之处。或者,“符号音乐”可能能够完全消除这种对立,就像它消除了对口语来说如此重要的听觉区别(例如元音和辅音之间的基本区别)一样。和谐与不和谐所产生的对立类型(例如,粗糙感与光滑感)在视觉符号中可能具有标志性的相似之处。或者,“符号音乐”可能能够完全消除这种对立,就像它消除了对口语来说如此重要的听觉区别(例如元音和辅音之间的基本区别)一样。和谐与不和谐所产生的对立类型(例如,粗糙感与光滑感)在视觉符号中可能具有标志性的相似之处。或者,“符号音乐”可能能够完全消除这种对立,就像它消除了对口语来说如此重要的听觉区别(例如元音和辅音之间的基本区别)一样。
Another indicator that musical development is less robust than language development concerns the modality-uniqueness of music. The fact that human language can jump modalities from a sound-based system to a silent, sign-based system attests to the robustness of language acquisition mechanisms. If musical acquisition were equally robust, one might expect that deaf communities would create their own “sign music,” in other words, a nonreferential but richly organized system of visual signs with discrete elements and principles of syntax, created and shared for aesthetic ends by an appreciative community. However, no such system exists (K. Emmorey, personal communication). Some deaf individuals do enjoy acoustic music via some residual hearing function and due the tactile sensations afforded by rhythm, but this is not the same as a sign-based form of music. Of course, one could argue that without basic auditory distinctions such as consonance and dissonance, music cannot “get off the ground.” Yet such an argument has a post hoc flavor. In principle, the types of oppositions created by consonance and dissonance (e.g., sensations of roughness vs. smoothness) could have iconic parallels in visual symbols. Alternatively, “sign music” might be able to dispense with this opposition altogether, just as it dispenses with auditory distinctions that seem so central to spoken language (such as the basic distinction between vowels and consonants).
总的来说,目前的证据表明,音乐能力的发展不如语言能力的发展强劲。这与零假设一致,即在人类进化过程中没有特定的音乐选择。
Overall, the current evidence suggests that the development of musical abilities is not nearly as robust as the development of linguistic abilities. This is consistent with the null hypothesis that there has been no specific selection for music in human evolution.
如第 7.2.10 节所述,自然选择形成的特征根据定义对个体的生存和繁殖成功很重要。例如,没有语言的成年人在人类文化中将处于严重劣势,无论是谈论我们的狩猎采集祖先还是生活在复杂技术环境中的现代人类。有没有证据表明缺乏音乐能力会产生生理成本?考虑一下音乐失聪的人的情况,他们甚至无法完成非常简单的音乐任务,例如识别熟悉的旋律或发现新旋律中的酸音。(非正式观察表明,可能还有一群人是“节奏性聋”,不能保持节拍或做其他基本的节奏性任务。)这些人似乎没有为他们的缺陷承担生物学成本,并且没有证据表明与有音乐天赋的人相比,他们是不太成功的复制人。这与自然选择没有为特定的音乐目的塑造我们的身体和大脑的观点是一致的。
As noted in section 7.2.10, traits that have been shaped by natural selection are by definition important to the survival and reproductive success of individuals. For example, adults without language would be at a serious disadvantage in human culture, whether one is speaking of our hunter-gatherer ancestors or of modern humans living in a complex technological setting. Is there any evidence that lack of musical abilities has a biological cost? Consider the case of musically tone-deaf individuals who are unable to do even very simple musical tasks such as recognize a familiar melody or spot a sour note in a novel melody. (Informal observation suggests that there may also be a population of individuals who are “rhythm deaf” and cannot keep a beat or do other basic rhythmic tasks.) Such individuals appear to bear no biological cost for their deficit, and there is no evidence that they are less successful reproducers than musically gifted individuals. This is consistent with the view that natural selection has not shaped our bodies and brain for specifically musical purposes.
第 7.3.2 节表明,就自然选择的几行证据而言,与语言相比,音乐表现不佳。这些证据集中在正常人的音乐能力如何发展上。本节采用不同的方法,侧重于婴儿的音乐能力。音乐作为适应观点的支持者经常认为人类婴儿是“天生的音乐”。毫无疑问,婴儿在频率、音调、音色和时长模式方面表现出非常好的听觉辨别能力(Fassbender,1996 年;Pouthas,1996 年)。此外,它们还显示出许多与音乐技能发展相关的更复杂的能力,例如能够根据旋律轮廓识别音高模式的相似性,而与精确的音程大小或整体音高水平无关,并根据独立于速度的分组结构识别节奏的相似性。此外,西方文化中的婴儿表现出对辅音音程的偏好超过不协和音程,对某些音高音程(例如纯五度)的高级编码,以及对婴儿定向歌唱的偏好超过婴儿定向语音(参见 Trehub 2000、2003b;以及 Trehub & Hannon , 2006 年,供评论)。对一些人来说,这些发现表明,进化特别塑造了人类,使其拥有获得成熟音乐能力所需的处理技能和倾向。以及更喜欢针对婴儿的歌唱而不是针对婴儿的演讲(参见 Trehub 2000、2003b;以及 Trehub 和 Hannon,2006 年的评论)。对一些人来说,这些发现表明,进化特别塑造了人类,使其拥有获得成熟音乐能力所需的处理技能和倾向。以及更喜欢针对婴儿的歌唱而不是针对婴儿的演讲(参见 Trehub 2000、2003b;以及 Trehub 和 Hannon,2006 年的评论)。对一些人来说,这些发现表明,进化特别塑造了人类,使其拥有获得成熟音乐能力所需的处理技能和倾向。
Section 7.3.2 showed that music does not fare well when compared to language with regard to several lines of evidence for natural selection. Those lines of evidence focused on how musical abilities develop in normal individuals. This section takes a different approach, and focuses on the musical abilities of infants. Proponents of the music-as-adaptation view often suggest that human babies are “born musical.” It is certainly true that infants show very good auditory discrimination skills for frequency, pitch, timbre, and durational patterning (Fassbender, 1996; Pouthas, 1996). Furthermore, they show numerous more sophisticated abilities relevant to the development of musical skills, such as the ability to recognize the similarity of pitch patterns based on melodic contour independent of exact intervals sizes or overall pitch level, and to recognize the similarity of rhythms based on grouping structure independent of tempo. Furthermore, babies from Western cultures show a preference for consonant over dissonant intervals, superior encoding of certain pitch intervals such the perfect fifth, and a preference for infant-directed singing over infant-directed speech (see Trehub 2000, 2003b; and Trehub & Hannon, 2006, for reviews). To some, these findings suggest that evolution has specifically shaped humans to have the processing skills and predispositions needed to acquire mature musical abilities.
尽管毫无疑问婴儿具有与音乐相关的特征,但从进化的角度来看,真正的问题是他们是否具有特定的先天倾向或先天学习偏好音乐 (Justus & Hutsler, 2005; McDermott & Hauser, 2005)。相反,如果这些特征可以通过具有更明显选择优势的其他特征(例如语言)来解释,或者作为一般听觉处理机制的副产品来解释,那么它们就无法提供音乐选择的证据。此外,在提出这个特异性问题之前,必须有确凿的证据表明倾向或学习偏好确实是先天的,而不是由于经验。这对于婴儿研究来说是一个重要的问题,因为听觉学习在出生前就开始了(参见第 7.3.3 节,“音乐与婴儿:天赋问题”小节)。因此,在讨论人类婴儿的听觉倾向之前,值得举出两个动物研究的例子来说明先天的感知偏见。这两个例子都来自对鸟类的研究。
Although there is little doubt that infants have musically relevant traits, from an evolutionary standpoint the real question is whether they have innate predispositions or innate learning preferences that are specific to music (Justus & Hutsler, 2005; McDermott & Hauser, 2005). If instead these characteristics can be explained via other traits with a more obvious selective advantage (such as language), or as a byproduct of general auditory processing mechanisms, then they provide no evidence of selection for music. Furthermore, before this question of specificity is raised, one must have firm evidence that the predispositions or learning preferences really are innate and not due to experience. This is a nontrivial issue for infant studies, because auditory learning begins before birth (cf. section 7.3.3, subsection “Music and Infancy: Questions of Innateness”). Thus before discussing the auditory proclivities of human infants, it is worth giving two examples from animal studies that illustrate innate biases in perception. Both examples come from research on birds.
先天知觉倾向的有力证据来自对小家鸡和日本鹌鹑(以下简称小鸡和鹌鹑)的比较研究。Park 和 Balaban (1991) 将小鸡和鹌鹑蛋放在孵化器中,完全隔绝父母的声音。孵化后,鸟类接受了知觉测试,将它们放在一个房间里,扬声器嵌入了左右墙壁。每个说话者都发出这两个物种之一的母性叫声(母性叫声是由母亲发出的,目的是吸引小鸟到她身边,例如,在遇到危险时)。每只鸟都受到来自两个物种的相同数量的母性呼叫。Park 和 Balaban 根据每只鸟推墙的时间来衡量它们的知觉偏差,
Compelling evidence for innate perceptual predispositions comes from comparative work on baby domestic chickens and Japanese quails (henceforth chicks and quails). Park and Balaban (1991) housed chick and quail eggs in incubators in complete isolation from parental vocalizations. After hatching, birds were given a perceptual test in which they were placed in a chamber with speakers embedded in the left and right walls. Each speaker produced the maternal call of one of the two species (the maternal call is produced by a mother to draw baby birds to her, e.g., in case of danger). Each bird was exposed to equal numbers of maternal calls from the two species. Park and Balaban measured each bird’s perceptual bias in terms of the amount of time it spent pushing against one wall or the other, and found that chicks and quails showed a significant preference to approach the calls of their own species.
在这项原始研究十年后,Long 等人。(2001) 进行了一项令人印象深刻的实验,探究了这种偏好的神经基础。使用 Balaban 等人开创的外科技术。(1988),研究人员在鸡蛋上切出小孔并对胚胎进行手术,将发育中的鹌鹑神经管的不同部分移植到小鸡体内。然后他们将卵密封起来,放在与成鸟声音隔离的孵化器中。孵化后,他们使用 Park 和 Balaban (1991) 的方法测试了这些嵌合鸟的感知偏好。他们发现,当移植物位于发育中的中脑的特定区域时,嵌合体表现出对鹌鹑母性呼叫的偏好。(他们还表明,这并不是因为嵌合体自己发出类似鹌鹑的叫声。) 因此,科学家们能够移植天生的感知偏好。有趣的是,这些作者随后的工作表明,移植区域并不充当简单的“大脑模块”,而是对大脑的其他区域(包括前脑)产生发育影响(Long 等人,2002 年)。因此,小鸡或鹌鹑对其同类母性叫声的感知偏好可能是多个大脑区域之间神经相互作用的结果。
A decade after this original study, Long et al. (2001) performed an impressive experiment that probed the neural basis of this preference. Using surgical techniques pioneered by Balaban et al. (1988), the researchers cut small holes in the eggs and operated on the embryos, transplanting different portions of the developing neural tube of quails into chicks. They then sealed up the eggs and housed them in incubators isolated from adult bird sounds. After hatching, they tested these chimeric birds for their perceptual preferences using the methods of Park and Balaban (1991). They found that when the transplant was in a specific region of the developing midbrain, the chimeras showed a preference for the quail maternal call. (They also showed that this was not due to the chimeras producing a quail-like call themselves.) Thus the scientists were able to transplant an inborn perceptual preference. Interestingly, subsequent work by these authors suggests that the transplanted region does not act as a simple “brain module” but has developmental effects on other regions of the brain, including the forebrain (Long et al., 2002). Thus the perceptual preference of a baby chicken or quail for its own species’ maternal call may be the result of neural interactions among a number of brain regions.
这些研究为先天的知觉倾向提供了确凿的证据。然而,先天倾向并不是先天的唯一形式。学习也可以受先天因素的指导,这将在下一节中讨论。
These studies provide solid evidence for an innate perceptual predisposition. However, inborn predispositions are not the only form of innateness. Learning can also be guided by innate factors, as discussed in the next section.
对鸣禽的研究为动物天生的学习偏好提供了很好的证据。众所周知,如果不允许一只年轻的白冠麻雀在他生命的早期听到一个成年人的歌曲,它永远不会唱一首正常的歌曲,而是会产生一首简化得多的“孤立歌曲”(Marler,1970;cf. Zeigler & Marler,2004)。这一事实使生物学家能够探索这些鸟类在它们将学习的歌曲方面的选择性。用于探索这个问题的一种方法是让一只年轻的雄性白冠麻雀听各种物种的歌曲(例如,那些可能出现在其自然栖息地的歌曲)。在这种情况下,鸟类表现出强烈的学习同类歌曲的偏好 (Marler & Peters, 1977)。此外,已经证明这不是由于无法产生其他物种的歌曲(Marler,1991)。
Studies of songbirds provide good evidence for innate learning preferences in animals. It is well known that if a young white-crowned sparrow is not allowed to hear the song of an adult during an early portion of his life, it will never sing a normal song, but will produce instead a much simplified “isolate song” (Marler, 1970; cf. Zeigler & Marler, 2004). This fact has allowed biologists to explore how selective these birds are in terms of the songs that they will learn. One method that has been used to explore this question is to expose a young, male, white-crowned sparrow to songs of a variety of species (e.g., those that might occur in its natural habitat). Under these circumstances, the bird shows a strong predilection to learn the song of its own species (Marler & Peters, 1977). Furthermore, it has been demonstrated that this is not due to the inability to produce the songs of other species (Marler, 1991).
对于神经生物学家来说,关键问题是什么声学线索和大脑机制介导选择性学习,关于这个问题的研究正在进行中(例如,Whaling 等人,1997)。就我们的目的而言,相关的一点是进化可以为动物提供天生的学习偏好,这些偏好不依赖于先前对特定物种声音模式的听觉体验。白冠麻雀先天的学习偏好,就像小鸡和鹌鹑先天的知觉倾向一样,无疑是自然选择的结果,因为它们不是任何其他特征的副产品,而且它们具有明显的适应价值。
For neurobiologists, the key question is what acoustic cues and brain mechanisms mediate selective learning, and research on this issue is underway (e.g., Whaling et al., 1997). For our purposes, the relevant point is that evolution can provide animals with inborn learning preferences that do not depend on prior auditory experience with species-specific sound patterns. The innate learning preferences of white-crowned sparrows, like the innate perceptual predispositions of chicks and quails, are unquestionably the result of natural selection, because they are not a byproduct of any other trait and because they have a clear adaptive value.
专栏 7.1 总结了人类婴儿研究的一些发现,这些发现通常被认为是与音乐相关的先天偏见(参见 Trehub 2000、2003b 和下一段中的参考资料)。
Box 7.1 summarizes a number of findings from human infant research that are often suggested as innate biases relevant to music (see references in Trehub 2000, 2003b, and in the following paragraph).
这些发现的累积权重使一些研究人员推测人类音乐能力一直是自然选择的直接目标。然而,这个想法有一个问题。许多调查结果(即专栏 7.1中的第 1-7 项) 可以通过与语音处理相关的偏差或作为一般听觉处理的副产品来解释,正如下一节(关于特异性问题)所探讨的那样。让我们依次简要考虑其余的发现。婴儿对辅音的偏好超过了不和谐的音程和音乐(例如,Trainor、Tsang 和 Cheung,2002 年)。尽管很有趣,但很难排除先前经验在塑造这种偏好方面的作用:婴儿在接受测试时可能已经大量接触音乐(例如,通过摇篮曲、播放歌曲),而且这种音乐几乎肯定具有许多特点比不协和的音程更协和。这是一个令人担忧的问题,因为研究已经在单纯的接触和对音乐材料的偏好之间建立了联系(例如,Peretz、Gaudreau 等,1998)。
The cumulative weight of these findings has led some researchers to speculate that human musical abilities have been a direct target of natural selection. There is a problem with this idea, however. Many of the findings (i.e., items 1-7 in Box 7.1) can be accounted for by biases related to speech processing or as a byproduct of general auditory processing, as explored in the next subsection (on issues of specificity). Let us briefly consider the remaining findings in turn. A preference for consonant over dissonant intervals and music has been repeatedly demonstrated in infants (e.g., Trainor, Tsang, & Cheung, 2002). Although intriguing, the role of prior experience in shaping this preference has been difficult to rule out: Infants have likely had significant exposure to music by the time they are tested (e.g., via lullabies, play songs), and this music almost certainly features many more consonant than dissonant intervals. This is a concern because research has established a link between mere exposure and preference for musical materials (e.g., Peretz, Gaudreau, et al., 1998). A strong test of an inborn preference for consonant versus dissonant intervals requires testing infants without previous musical exposure.
幸运的是,Masataka (2006) 进行了一项接近这一理想的研究。他测试了 2 天大的婴儿,这些婴儿的父母是聋哑人且精通手语,这些婴儿可能在产前几乎没有接触过音乐和语音。Masataka 在两个版本中检查了 30 秒莫扎特小步舞曲的观看时间:原始版本和不和谐版本,其中许多间隔被修改为不和谐(声音示例 7.1a、b;最初来自 Trainor & Heinmiller 的刺激,1998 ). 新生儿表现出对辅音版本的小步舞曲的偏爱,尽管这种偏爱非常轻微,这表明需要复制。如果这个结果可以被证明是可靠的,这将是关于音乐的进化辩论的重要证据,因为在非人类灵长类动物中缺乏这种偏好(cf.第 7.3.5 节,“动物、和谐音和不和谐音”小节)。然而,正如 Masataka 指出的那样,即使在他的研究人群中,也不可能排除产前在周围环境中接触音乐的可能性。由于音乐模式的产前学习证据(如本节后面更详细的讨论),这是任何婴儿音乐认知研究的关注点。因此在这个时候,人类是否天生就偏爱辅音乐音,这仍然是一个悬而未决的问题。
Fortunately, Masataka (2006) has conducted a study that comes close to this ideal. He tested 2-day-old infants of deaf and sign-language-proficient parents, who presumably had little prenatal exposure to the sounds of music and speech. Masataka examined looking times to a 30-second Mozart minuet in two versions: the original version and a dissonant version in which many of the intervals had been modified to be dissonant (Sound Example 7.1a, b; stimuli originally from Trainor & Heinmiller, 1998). The newborns showed a preference for the consonant version of the minuet, though the preference was very slight, which points to the need for replication. If this result can be shown to be robust, it would be important evidence in evolutionary debates over music, given the lack of such a preference in nonhuman primates (cf. section 7.3.5, subsection “Animals, Consonance, and Dissonance”). However, as noted by Masataka, even in his study population, it is impossible to rule out possible prenatal exposure to music in the ambient environment. This is a concern for any study of infant music cognition due to evidence for prenatal learning of musical patterns (as discussed in more detail later in this section). Thus at this time, whether humans have an inborn preference for consonant musical sounds is still an open question.
转向不对称音阶的偏见(在第 2 章第2.2.2 节中讨论),这种偏见实际上并没有被成年人表现出来,他们在不熟悉的不对称音阶上的表现并不比对称音阶好(Trehub 等人,1999)。因此,偏见相当微弱,可以被文化因素所否决。即使将讨论限制在婴儿数据上,也可能是音高间隔的不对称模式为听众提供了音高空间中更好的位置感,就像如果墙壁不同,则更容易在房间的物理空间中定位一样长度。也就是说,可能不需要特定于音乐的认知原则。
Turning to a bias for asymmetric musical scales (discussed in Chapter 2, section 2.2.2), this bias is actually not shown by adults, who perform no better on unfamiliar asymmetric than symmetric scales (Trehub et al., 1999). Thus the bias is rather weak and can be overruled by cultural factors. Even confining discussion to the infant data, it could be that that asymmetric patterns of pitch intervals provide listeners with a better sense of location in pitch space, just as it is easier to orient oneself in physical space in a room if the walls are of different lengths. That is, a music-specific cognitive principle may not be needed.
母婴声音互动的音乐性(例如,Trevarthen;1999 年,Longhi,2003 年)很有趣,尤其是因为众所周知母婴之间相互作用受制于组织良好的时间偶然事件(Tronick 等人,1978 年;Nadel 等人,1999 年)。然而,目前,我们不知道母亲和婴儿是否以跨文化的音乐方式互动,因为研究的文化数量有限。
The musicality of mother-infant vocal interactions (e.g., Trevarthen; 1999, Longhi, 2003) is intriguing, especially because it is known that mother-infant interactions are subject to well-organized temporal contingencies (Tronick et al., 1978; Nadel et al., 1999). However, at the current time, we do not know whether mothers and infants interact in a musical fashion across cultures, because a limited number of cultures have been studied.
摇篮曲在各种文化中都有记载(Trehub,2000 年),因此在进化研究中引起了关注(McDermott & Hauser,2005 年)。摇篮曲还表现出跨文化的形式和功能的相似性(Unyk 等人,1992 年),这可能会让人怀疑婴儿天生就会适应这种特定的音乐形式。然而,这种跨文化的流行和相似性可能仅仅反映了这样一个事实,即婴儿发现具有某些特征的发声(例如缓慢的节奏和平滑、重复、下降的音调轮廓)是舒缓的(参见 Papousek,1996)。此外,婴儿通常会发现仪式令人欣慰。因此,世界各地的成年人为了安抚婴儿而采用结构相似的音乐仪式也就不足为奇了。
Lullabies have been documented across a wide range of cultures (Trehub, 2000) and have thus attracted attention in evolutionary studies (McDermott & Hauser, 2005). Lullabies also show a similarity of form and function across cultures (Unyk et al., 1992), which might lead one to suspect that infants are innately tuned to this specific musical form. However, this cross-cultural prevalence and similarity may simply reflect the fact that infants find vocalizations with certain characteristics (such as slow tempo and smooth, repetitive, falling pitch contours) to be soothing (cf. Papousek, 1996). Furthermore, infants usually find rituals comforting. Thus it may not be surprising that adults all over the world arrive at similarly structured musical rituals for the purpose of soothing infants.
在方框 7.1中的所有项目中, 对婴儿导向的歌唱的偏好超过婴儿导向的语言可能是关于音乐的先天倾向的最具暗示性的。这方面的证据来自 Nakata 和 Trehub(2004 年)的一项研究,他们进行了一项实验,让 6 个月大的婴儿观看他们自己母亲的视频(有声)。一组婴儿看到他们的母亲唱歌,而另一组婴儿看到他们说话。说话和唱歌的例子都是在第一次访问实验室时录制的,在实验室里,母亲直接对着她的婴儿说话或唱歌。说话时,妈妈们自然会使用针对婴儿的言语(或“motherese”)。面向婴儿 (ID) 的歌唱在许多方面与 ID 语音不同,包括平均音高略低、音高变化控制更严格(正如预期的那样,因为歌唱涉及在明确定义的音高水平之间移动)和较慢的节奏(Trainor 等人,1997 年;Trehub 等人,1997 年)。研究人员发现,婴儿观看母亲歌曲视频的时间明显长于观看母亲讲话的时间。这是一个有趣的结果,因为婴儿似乎表达了对音乐的偏好,而不是语言,如果人们认为语言反映了生物适应而音乐没有,那么这个结果有点令人惊讶。
Of all the items in Box 7.1, the preference for infant-directed singing over infant-directed speech is perhaps the most suggestive with regard to innate predispositions for music. Evidence for this comes from a study by Nakata and Trehub (2004), who conducted an experiment in which 6-month-old infants watched a video (with sound) of their own mother. One set of infants saw their mothers singing, whereas the other set saw them speaking. Both the speaking and the singing examples had been recorded during an initial visit to the lab in which the mother was recorded while either speaking or singing directly to her infant. When speaking, mothers naturally used infant-directed speech (or “motherese”). Infant-directed (ID) singing differs from ID speech in a number of ways, including slightly lower average pitch, more tightly controlled pitch variation (as expected, because singing involves moving between well-defined pitch levels), and slower tempo (Trainor et al., 1997; Trehub et al., 1997). The researchers found that infants looked significantly longer at videos of maternal song than of maternal speech. This is an interesting result because infants seem to be expressing a preference for music over speech, a somewhat surprising result if one believes that speech reflects biological adaptation but music does not.
然而,关键问题是是什么导致歌曲视频与语音视频的观看时间更长。也许最简单的解释只是一种新奇偏好,假设婴儿对母性歌唱的体验不如母性语言。然而,让我们暂时假设这不是驱动因素。众所周知,ID 唱歌比 ID 演讲更刻板,在不同场合以几乎相同的音高水平和节奏进行表演(Bergeson & Trehub,2002)。因此,歌曲是一种仪式化的表演,婴儿可能会觉得很有吸引力,就像他们觉得其他仪式(比如躲猫猫这样的游戏)很有吸引力一样。或者,正如 Nakata 和 Trehub 所建议的,ID 唱歌可能更受欢迎,因为婴儿认为它比 ID 语音更具情感(参见 Trainor 等人,1997 年,2000 年)。积极情感的声学线索,就像他们更喜欢听快乐的情绪化演讲而不是情感中性的演讲一样(即使前者是针对成人的,后者是针对婴儿的;Singh 等人,2002 年)。对积极情感声音刺激的偏好也可以解释新生儿更喜欢婴儿指导的歌唱而不是成人指导的歌唱的发现,即使他们的父母是聋子,因此他们在出生前很少接触歌唱(Masataka,1999 年;参见 Trainor 等人) ., 1997).
The key question, however, is what is driving the longer looking time for videos of song versus speech. Perhaps the simplest explanation is simply a novelty preference, under the assumption that maternal singing is less commonly experienced by infants than maternal speech. Let us assume for the moment, however, that this is not the driving factor. It is known that ID singing is much more stereotyped than ID speech, being performed at nearly identical pitch levels and tempos on different occasions (Bergeson & Trehub, 2002). Thus song is a ritualized performance that infants may find compelling, in the same way they find other rituals (such as games like peek-a-boo) compelling. Alternatively, as suggested by Nakata and Trehub, ID singing may be preferred because infants perceive it as more emotive than ID speech (cf. Trainor et al., 1997, 2000). Thus their preference for music in this study could be an inborn preference for acoustic cues to positive affect, just as they prefer to listen to happy emotive speech than to affectively neutral speech (even when the former is adult-directed and the latter is infant-directed; Singh et al., 2002). A preference for positively affective vocal stimuli may also explain the finding that newborns prefer infant-directed over adult-directed singing, even when their parents are deaf and they thus had minimal exposure to singing before birth (Masataka, 1999; cf. Trainor et al., 1997).
如果对 ID 歌曲的偏好(相对于 ID 语音或成人指导的歌曲)是由声音提示驱动的积极影响,这将预测如果以保持音乐准确性的方式完成 ID 歌唱,偏好将消失,但它比 ID 演讲或成人歌曲更乏味。基本点是偏好研究,如果它们旨在解决音乐先天倾向的问题,则需要仔细控制。(需要注意的是,Nakata、Trehub 和 Masataka 并未设计他们的研究来解决进化问题。)
If preference for ID song (vs. ID speech or adult-directed song) is driven by vocal cues to positive affect, this would predict that the preference would disappear if ID singing is done in a way that preserves the accuracy of the music but makes it affectively duller than the ID speech or adult-directed song. The basic point is that preference studies, if they are intended to address the question of innate predispositions for music, require careful controls. (It should be noted that Nakata and Trehub and Masataka did not design their study to address evolutionary issues.)
因此,目前看来,针对特定音乐的先天偏见的论据似乎很薄弱。但是,将来可能会出现一些令人信服的证据。然而,为了令人信服,未来对先天偏见的发展研究必须解决当前所有研究都面临的一个问题,换句话说,就是先前接触音乐和伴随音乐学习的问题。是什么让先天偏见的鸟类研究如此有说服力(参见第7.3.3节,前两小节)是因为他们严格排除了先前暴露作为解释其结果的可能变量。目前对婴儿音乐认知的研究还远未达到这一标准。
Thus at the current time, it appears that the case for music-specific innate biases is weak. However, it is possible that some compelling evidence will come to light in the future. To be convincing, however, future developmental studies of innate biases will have to address an issue that faces all current studies, in other words, the issue of prior exposure to music and the learning that accompanies it. What makes avian studies of innate biases so persuasive (cf. sections 7.3.3, first two subsections) is that they have rigorously excluded prior exposure as a possible variable in accounting for their results. Current research on infant music cognition is far from attaining this standard.
话虽这么说,如果可以证明在大多数婴儿音乐感知研究进行的年龄(例如 6-8 个月)之前听觉学习是最小的,那么人们可能不必担心。不幸的是,数据指向完全相反的方向。与视觉体验不同,听觉体验早在出生前就开始了(Lecanuet,1996)。人类在胎龄 30 周左右开始对声音做出反应,到出生时,他们已经对听觉环境有了很多了解。对新生儿的研究证明了这一点,他们使用一种方法对新生儿进行测试,该方法允许他们通过吸吮人工乳头的速度来表明他们的听觉偏好(“非营养性吸吮范式”;参见 Pouthas,1996 年的介绍)。这项研究表明,新生儿更喜欢母亲的声音胜过陌生人的声音 (DeCasper & Fifer, 1980),母亲在怀孕最后 6 周阅读的故事胜过小说故事 (DeCasper and Spence, 1986),他们的母语到外语 (Mehler et al., 1988; Moon et al., 1993)。在每种情况下,偏好只能通过学习产生,其中 pro-sodic 模式可能起着重要作用(DeCasper 等人,1994 年;Nazzi 等人,1998 年;Floccia 等人,2000 年)。
That being said, if it could be shown that auditory learning is minimal prior to the age at which most infant music perception studies are conducted (e.g., 6-8 months), then one might need not worry. Unfortunately, the data point in exactly the opposite direction. Unlike visual experience, auditory experience begins well before birth (Lecanuet, 1996). Humans begin responding to sound around 30 weeks’ gestational age, and by the time of birth, they have already learned a good deal about their auditory environment. The has been demonstrated by studies of newborns tested using a method that allows them to indicate their auditory preferences by their rate of sucking on an artificial nipple (the “nonnutritive-sucking paradigm”; see Pouthas, 1996, for an introduction). This research has shown that newborns prefer their mother’s voice to that of a stranger (DeCasper & Fifer, 1980), a story read by the mother during the last 6 weeks of pregnancy to a novel story (DeCasper and Spence, 1986), and their native language to a foreign language (Mehler et al., 1988; Moon et al., 1993). In each case, the preference could arise only through learning, in which pro-sodic patterns probably play an important role (DeCasper et al., 1994; Nazzi et al., 1998; Floccia et al., 2000).
新生儿对熟悉的听觉刺激的偏好不仅限于言语。Hepper (1991) 让一组妈妈听一首特别的曲子在整个怀孕期间每天一次或两次,而另一组母亲则不听这首曲子。前一组的新生儿认出了这首曲子,正如出生后听到这首曲子时心率、运动和警觉性的变化所表明的那样(对照组没有这种变化)。为了确定新生儿是否根据特定的曲调做出反应(或者如果之前接触过音乐只是让他们对一般音乐更敏感),Hepper 进行了另一项研究,其中母亲在怀孕期间听了相同的曲调,但婴儿用不同的曲调或原始曲调的倒退版本进行测试。在这种情况下,新生儿没有表现出任何认出的迹象。赫珀继续表明,甚至在出生前就可以观察到对熟悉曲调的选择性反应(通过超声监测胎动),在 36-37 周胎龄。Hepper 对 29-30 周大的胎儿进行了最后一项研究,表明没有证据表明在这个年龄段没有曲调识别能力。
The preferences of newborns for familiar auditory stimuli are not limited to speech. Hepper (1991) had one group of mothers listen to a particular tune once or twice a day throughout their pregnancy, whereas another group of mothers did not listen to this tune. Newborns from the prior group recognized the tune, as indicated by changes in heart rate, movement, and alertness upon hearing the tune after birth (the control group showed no such changes). To ascertain if the newborns were responding on the basis of the specific tune (or if prior exposure to music simply made them more responsive to music in general), Hepper conducted another study in which mothers listened to the same tune during pregnancy, but babies were tested with a different tune or a backward version of the original tune. In this case, the newborns did not show any sign of recognition. Hepper went on to show that selective response to the familiar tune could be observed even before birth (via ultrasound monitoring of fetal movement), at 36-37 weeks of gestational age. Hepper did a final study with 29- 30-week-old fetuses, showing that there was no evidence of tune recognition at this age.
当然,如果可以证明产前学习仅限于语音和音乐的节奏和旋律,那么至少可以根据孤立声音的频谱结构测试新生儿的偏好,例如单音节与紧密匹配的非语音(如在 Vouloumanos & Werker [2004] 对 2 个月大的婴儿的研究中),或者诸如辅音与不和谐的音程(如 Trainor、Tsang 和 Cheung [2002] 对 2 个月大的婴儿的研究)。然而,最近一项关于胎儿听觉感知的研究表明,胎儿的声音学习可能不仅限于韵律线索。基西列夫斯基等人。(2003) 向 38 周大的胎儿展示了他们自己的母亲或另一名妇女朗读同一首诗的录音(每位母亲的朗读作为另一个胎儿的“陌生”声音)。录音通过腹部上方的扬声器呈现。胎心率表现出对母亲声音的反应持续增加,但对陌生人声音的反应持续降低。这项研究值得注意,因为控制条件是由另一个说话者产生的相同语言文本,这意味着总体节奏和韵律模式可能相似。显然需要做更多的工作,使用 Kisilevsky 等人的声学操作范式来确定胎儿在区分母亲的声音和另一个女人的声音时使用了哪些线索(例如,它只是表示基频,还是音色也重要的?)。然而,相关的一点是,当婴儿参与早期语音和音乐感知实验时(例如,在 2 个月大时),他们已经对环境的声音有了很多了解。
Of course, if it can be shown that prenatal learning is limited to rhythm and melody in speech and music, then one could at least test newborns for preferences based on the spectral structure of isolated sounds, such as single syllables versus closely matched nonspeech (as in Vouloumanos & Werker’s [2004] work with 2-month-olds), or such as consonant versus dissonant musical intervals (as Trainor, Tsang, and Cheung’s [2002] work with 2-month-olds). A recent study of fetal auditory perception, however, suggests that fetal sound learning may not be limited to prosodic cues. Kisilevsky et al. (2003) presented 38-week-old fetuses with recordings of their own mother or another woman reading the same poem (each mothers’ reading served as the “unfamiliar” voice for another fetus). Recordings were presented via a loudspeaker held above the abdomen. Fetal heart rate showed a sustained increased in response to the mother’s voice, but a sustained decrease in response to the stranger’s voice. This study is notable because the control condition was the same linguistic text produced by another speaker, which means that gross rhythmic and prosodic patterns were likely to be similar. More work is clearly called for, using Kisilevsky et al.’s paradigm with acoustic manipulations to determine which cues the fetus is using in discriminating its mother’s voice from that of another woman (e.g., is it simply mean fundamental frequency, or is timbre also important?). The relevant point, however, is that by the time infants participate in early speech and music perception experiments (e.g., at 2 months of age), they have already learned a good deal about the sounds of their environment.
那么如何利用婴儿研究来提出有关加工倾向的问题呢?有多种方法可以做到这一点。首先,可以使用婴儿不熟悉的另一种文化的音乐材料进行跨文化实验。Lynch 及其同事采用了这种策略,他们使用爪哇音乐系统来探索音阶感知的倾向(参见第 2 章,第 2.4.4 节),以及 Hannon 和 Trehub,他们使用巴尔干节奏研究婴儿检测熟悉与不熟悉的格律模式中时间变化的能力(参见本章附录)。一种相关的方法是测试来自非西方文化的婴儿对西方音乐模式的感知。对于西方婴儿的研究,这将是一个有用的控制,西方婴儿已经显示出某些小比例音高间隔的处理优势,例如纯五度(例如,Schellenberg & Trehub,1996;在第 2 章的附录 3中讨论)). 然而,越来越难在其他文化中找到没有接触过西方音乐的婴儿。未来跨文化研究的最佳方法可能是让父母参与不同类型音乐的控制接触研究。例如,可以说服一些父母(尤其是来自非西方背景的父母)在家里只听非西方音乐,从出生前开始,一直持续到他们的婴儿参与音乐感知实验。然后可以将这些婴儿与在接触西方音乐的家庭中长大的婴儿进行比较:任何无法用类似音乐输入解释的共性都可能是先天的、与音乐相关的倾向。
How then can one use infant studies to ask questions about processing predispositions? There are a number of ways in which this can be done. First, one can do cross-cultural experiments using musical material from another culture that is unfamiliar to the infants. This strategy has been employed by Lynch and colleagues, who have used the Javanese musical system to explore predispositions in the perception of musical scales (cf. Chapter 2, section 2.4.4), as well as by Hannon and Trehub, who used Balkan rhythms to study infants ability to detect temporal alterations in familiar versus unfamiliar metrical patterns (cf. this chapter’s appendix). A related approach is to test infants from non-Western cultures on the perception of Western musical patterns. This would be a useful control for studies of Western infants that have shown a processing advantage for certain small-ratio pitch intervals, such at the perfect fifth (e.g., Schellenberg & Trehub, 1996; discussed in appendix 3 of Chapter 2). However, it is increasingly hard to find infants in other cultures who have not been exposed to Western music. It may be that the best approach for future cross-cultural research is to involve parents in studies of controlled exposure to music of different types. For example, it may be possible to convince some parents (particularly those from non-Western backgrounds) to listen exclusively to non-Western music at home, starting before birth and continuing until their infants participate in music perception experiments. These infants can then be compared to infants raised in homes exposed to Western music: Any commonalities that cannot be explained by similar musical input would be candidates for innate, music-relevant predispositions.
控制暴露研究的想法可以进一步扩展。目前,不能排除语音感知相对于音乐感知的早熟发展的可能性(参见第 7.2.4 节和第 7.3.2 节,第一小节)仅仅是由于暴露量的差异(参见 McMullen & 萨弗兰,2004 年)。例如,在 van de Weijer (1998) 的研究中,如第 7.2.7 节所述以上,在婴儿生命中 18 天的完整记录和分析中,接触音乐的总时间不超过几分钟(van de Weijer,个人通讯)。当然,这是对单个婴儿的研究,可能不具有代表性。然而,几乎可以肯定的是,大多数婴儿和儿童听到的语音比音乐多得多,而且个体接触音乐的差异性远远大于语音暴露的差异性。因此,要真正测试语言感知是否比音乐感知早熟,应该与接触大量音乐的婴儿一起工作。例如,可以与在家练习/教学的音乐家或音乐教师的孩子一起工作,或者与参加婴儿音乐课程的婴儿一起工作。
The idea of controlled-exposure studies can be extended further. At the current time, one cannot rule out the possibility that the precocious development of speech perception relative to music perception (cf. section 7.2.4 and 7.3.2, first subsection) is simply due to differences in amount of exposure (cf. McMullen & Saffran, 2004). For example, in van de Weijer’s (1998) study, described in section 7.2.7 above, out of 18 days of an infant’s life that were fully recorded and analyzed, the total amount of musical exposure amounted to no more than a few minutes (van de Weijer, personal communication). Of course, this is a study of a single infant and may not be representative. It seems almost certain, however, that most infants and children hear a great deal more speech than music, and that the amount of variability in music exposure across individuals is far greater than the amount of variability in speech exposure. Thus to really test whether speech perception is precocious compared to music perception, one should work with infants who have a large amount of musical exposure. For example, one could work with children of musicians or music teachers who practice/teach at home, or with babies who participate in infant music programs. (In the United States, infant music programs exist at the Eastman School of Music and at the University of South Carolina.)
最终,在一个公平的竞争环境中比较语言和音乐的发展需要量化和均衡长时间内婴儿的语言输入和音乐输入的数量。这将需要长期录音(van de Weijer 使用的类型)和父母的合作,以便为婴儿提供等量的语言和音乐输入。例如,使用现代计算技术自动对数字化录音中的语音和音乐片段进行分类(Scheirer 和 Slaney,1997 年),父母可以在每天结束时收到关于他们的孩子听到的语音输入和音乐输入(以分钟为单位)的反馈那天。那时他们可以改变他们的家庭环境以尝试平衡这种输入量(例如,通过唱歌、CD、视频)。人们甚至可以在出生前开始这种输入匹配,在胎儿开始听到的时候。然后可以测试接触过相应数量的语音和音乐的婴儿的各种语言和音乐能力。例如,可以使用 ERP 和行为实验来探索两个领域中学习的声音类别和句法知识的发展。当然,关键问题是,当输入匹配时,语言能力的发展是否仍会超过音乐能力的发展。如果是这样,这表明自然选择并没有为我们提供学习音乐的先天倾向。13
Ultimately, comparing speech to musical development on a level playing field will require quantifying and equalizing the amount of linguistic versus musical input to infants over extended periods of time. This will require long-term recordings (of the type used by van de Weijer) and the cooperation of parents in order to give infants equal amounts of linguistic and musical input. For example, using modern computing technology to automatically classify spoken versus musical segments in digitized recordings (Scheirer & Slaney, 1997), parents could be given feedback at the end of each day on how much spoken versus musical input (in minutes) their child heard that day. They could then modify their home environment to try to equalize this amount of input (e.g., via singing, CDs, videos). One could even start this sort of input matching before birth, at the time the fetus begins to hear. Infants who have been exposed to matched amounts of speech and music could then be tested for a variety of linguistic and musical abilities. For example, one could use ERPs and behavioral experiments to probe the development of learned sound categories and syntactic knowledge in the two domains. The key question, of course, is whether the development of linguistic abilities will still outstrip the development of musical abilities when input is matched. If so, this would suggest that natural selection has not provided us with innate predispositions for learning music.13
从某种意义上说,上面提出的实验只是现有统计学习实验的扩展版本。此类研究表明,婴儿善于使用输入的分布特性(例如事件的发生频率或事件的共现)来形成类别并推断语音和音乐中刺激的结构特性(例如,Maye 等人, 2002 年;Saffran 等人,1996 年;Saffran 等人,1999 年)。这些实验涉及控制暴露于特定模式,然后进行测试以确定输入中的统计规律是否影响学习。这些研究具有严格可控性的优势,但它们不可避免地涉及在进入实验室之前接触过大量语音和音乐的婴儿。
In a sense, the experiment proposed above is simply an extended version of existing experiments on statistical learning. Such studies have shown that infants are adept at using distributional properties of the input (such as frequency of occurrence or co-occurrence of events) to form categories and infer structural properties of stimuli in both speech and music (e.g., Maye et al., 2002; Saffran et al., 1996; Saffran et al., 1999). These experiments involve controlled exposure to particular patterns, followed by tests to determine if statistical regularities in the input influence learning. These studies have the advantage of tight controllability, but they inevitably deal with infants that have had unmatched amounts of exposure to speech and music before coming into the laboratory. If one wants to address the evolutionary question of innate predispositions for music, then comparisons of music and language following a history of matched exposure provide an unparalleled, if logistically challenging, approach to this question.
尽管毫无疑问婴儿具有与音乐相关的能力,但从进化的角度来看,关键问题是这些能力是否反映了由音乐选择形成的机制,或者它们是否是语言理解或一般听觉处理机制的副产品(参见Trehub 和汉农,2006 年)。
Although there is little doubt that infants have musically relevant abilities, the key question from an evolutionary standpoint is whether these reflect mechanisms shaped by selection for music, or whether they are a byproduct of mechanisms used in language comprehension or in general auditory processing (cf. Trehub & Hannon, 2006).
方框 7.1中的一些发现(即第 1、2、6 和 7 项)可能可以通过语音处理来解释。识别相似性的能力基于旋律轮廓的音调模式在语音感知中非常有用,因为婴儿必须学会识别何时语调轮廓是“相同的”(例如,强调同一个词,或表达相似的影响),无论是男性说的、女性或儿童。类似地,对独立于速度的时间模式的敏感性(例如,对连续事件之间持续时间比率的敏感性)对于处理语速变化很有用(Trehub,2000)。婴儿的半球轮廓与间隔的不对称可能反映了在两个半球分析音高模式时谷物的早期不对称,这可能再次与语音相关(例如,大脑可以准备好在两个细节级别处理音高模式, 声调语言中词汇声调的细粒度级别与韵律轮廓的更全局尺度)。关于“母语”(或者更准确地说,是针对婴儿的言语,因为父亲和哥哥姐姐也会这样做),有证据表明这种特殊的言语形式对婴儿有很多语音上的好处,包括增加元音之间的声学对比(这可以促进元音类别的形成;Kuhl 等人,1997 年),并促进多音节词中辅音对比的学习(Karzon,1985 年)。此外,作为这种语言形式的特征并且对婴儿来说如此显着的独特的音高扫描 (Fernald & Kuhl, 1987) 似乎在调节婴儿注意力和唤醒方面发挥了特定作用 (Papousek 等人, 1990; Papousek 1996)。
Several of the findings in Box 7.1 (i.e., items 1, 2, 6, and 7) can likely be accounted for via speech processing. The ability to recognize the similarity of pitch patterns based on melodic contours would be very useful in speech perception because infants must learn to recognize when an intonation contour is “the same” (e.g., has emphasis on the same word, or expresses a similar affect), whether spoken by males, females, or children. Similarly, sensitivity to temporal patterns independent of tempo (e.g., sensitivity to duration ratios between successive events) would be useful for dealing with variations in speech rate (Trehub, 2000). Hemispheric asymmetries in infants for contour versus interval could reflect early asymmetries in the grain at which pitch patterns are analyzed in the two hemispheres, which could once again be related to speech (e.g., brains could come prepared for processing pitch patterns at two levels of detail, a finegrained level for lexical tones in tone languages versus a more global scale for prosodic contours). With regard to “motherese” (or more correctly, infant-directed speech, because fathers and older siblings do it, too), there is evidence that this special form of speech has numerous phonological benefits to the infant, including increased acoustic contrastiveness between vowels (which could facilitate the formation of vowel categories; Kuhl et al., 1997), and facilitating the learning of consonantal contrasts in polysyllabic words (Karzon, 1985). Furthermore, the distinct pitch sweeps that characterize this form of speech and that are so salient to the infant (Fernald & Kuhl, 1987) appear to play specific roles in modulating infant attention and arousal (Papousek et al., 1990; Papousek 1996). It can thus be explained without any reference to music.
框 7.1中的其他发现(即项目 3、4 和 5)可能被视为一般听觉处理的副产品。例如,对格式塔听觉模式的敏感性可能是由于将传入声音分配给不同来源的机制 (Bregman, 1990),而不是与音乐有任何具体关系。某些音高音程(例如,第五音程)的出色处理可能反映了脊椎动物神经系统中音高的编码方式(Tramo 等人,2003 年),这反过来可能反映了听觉对象识别机制的演变(例如,相隔五分之一的分音可能更可能来自单个物体,因为有太多的生物体发出谐波声音)。测试这个想法的一种方法是看看类似的偏见是否存在于非人类动物的听觉处理中,例如,第 7.3.5 节)。
Other findings in Box 7.1 (i.e., items 3, 4, and 5) can likely be accounted for as byproducts of general auditory processing. For example, sensitivity to Gestalt auditory patterns is likely due to mechanisms for assigning incoming sounds to distinct sources (Bregman, 1990), rather than having anything specifically to do with music. The superior processing of certain pitch intervals (e.g., the fifth) may reflect the way pitch is coded in the vertebrate nervous system (Tramo et al., 2003), which in turn could reflect the evolution of mechanisms for auditory object recognition (e.g., partials separated by a fifth may be more likely to come from a single object, because so many organisms make harmonic sounds). One way to test this idea is to see if similar biases exist in the auditory processing of nonhuman animals, for example, primates and birds (I will return to this topic in section 7.3.5).
在结束本节之前,有必要对音乐和语言共享发展机制的两种方式进行概念上的区分。在一种情况下,只要音乐采用与语言足够相似的结构或过程来激活这些机制,音乐就会参与一种专门针对语言进化的机制。例如,在音乐中形成学习声音类别系统的能力——例如音高间隔——可能依赖于使大脑能够学习语音音素类别的机制(参见第 2章2.4 节)。有人可能会指出,由于脑损伤(例如,“纯词性耳聋”),处理音调和语言材料存在分离,但这种缺陷仅表明所学音乐和语言声音类别的表示以允许选择性损坏的方式存储(例如,它们可能存储在不同的大脑区域)。学习机制可能仍然相同。学习机制重叠的证据来自两个领域中声音类别学习之间的关联。例如,如果一个领域的表现可以预测另一个领域的表现,那么这表明存在共同的学习机制(参见 Anvari 等人,2002 年;Slevc 和 Miyake,2006 年)。音乐和语言可以共享发展机制的第二种方式是,如果两者都利用更普遍的认知过程,这些过程既不是语言也不是音乐所独有的。例如,类比推理的一般过程可用于理解语言和音乐中的话语关系(参见第 6 章,第 6.3.2 节)。
Before concluding this section, it is useful to draw a conceptual distinction between two ways in which music and language can share developmental mechanisms. In one case, a mechanism specialized by evolution for language can be engaged by music whenever music employs structures or processes similar enough to language to activate these mechanisms. For example, the ability to form a system of learned sound categories in music—such as pitch intervals—may rely on mechanisms that enable the brain to learn the phonemic categories of speech (cf. Chapter 2, section 2.4). One might point out that there are dissociations of processing tonal and verbal material due to brain damage (e.g., “pure word deafness”), but such deficits simply show that the learned representations for musical and linguistic sound categories are stored in ways that allow for selective damage (for example, they may be stored in different brain areas). The learning mechanisms may still be the same. Evidence for overlap in learning mechanisms would come from associations between sound category learning in the two domains. For example, if performance in one domain predicts performance in the other, then this suggests common learning mechanisms (cf. Anvari et al., 2002; Slevc & Miyake, 2006). The second way in which music and language can share developmental mechanisms is if both draw on more general cognitive processes that are unique to neither language nor music. For example, general processes of analogical reasoning may be used in understanding discourse relations in both language and music (see Chapter 6, section 6.3.2).
遗传学和音乐的讨论通常围绕着关于先天音乐天赋的争论展开(例如,Howe 等人,1998 年;Winner,1998 年),这一争论今天仍在继续,并增加了神经影像学的维度(Norton 等人,2005 年)。然而,我们在这里关注的不是音乐天赋,而是基本的音乐能力,换句话说,就是那些在人群中普遍存在的音乐能力。乍一看,基因影响这种能力的任何证据似乎都是音乐自然选择的证据。然而,片刻的反思表明并非如此。举个例子,有一种已知的单基因突变对音乐(相对于语言)能力有深远的影响,有效地破坏了音乐感知,同时保持正常语言处理的潜力完好无损。这是因为该基因会导致耳聋(Lynch et al., 1997)。聋人可以具有正常的语言功能(使用手语),但被排除在与音调相关的音乐能力之外。因此,虽然这个基因影响音乐认知,但它的存在显然不能提供音乐是自然选择的直接目标的证据。
Discussions of genetics and music often revolve around the debate over innate musical talent (e.g., Howe et al., 1998; Winner, 1998), a debate that continues today with the added dimension of neuroimaging (Norton et al., 2005). Our concern here, however, is not with musical giftedness but with basic musical abilities, in other words, those musical abilities that are widespread in the population. At first blush, it may seem that any demonstration that genes influence such abilities would be evidence of natural selection for music. A moment’s reflection, however, suggests otherwise. To take one example, there is a known single-gene mutation that has a profound influence on musical (vs. linguistic) ability, effectively disrupting music perception while leaving the potential for normal language processing intact. This is because the gene results in deafness (Lynch et al., 1997). Deaf individuals can have normal linguistic function (using sign language), but are excluded from pitch-related musical abilities. Thus although this gene influences music cognition, its existence obviously provides no evidence that music has been a direct target of natural selection.
当然,我们对这些微不足道的案例不感兴趣,但耳聋的例子有助于突出任何基因-音乐关系研究的关键问题,即基因与感兴趣特征之间的联系有多具体?一个相关的问题是,将基因与感兴趣的性状联系起来的机制是什么?对于那些有兴趣探索基因、音乐和进化之间联系的人来说,这两个问题在思考遗传学和音乐能力之间的关系时必须占据突出地位。
Of course, we are not interested in such trivial cases, but the deafness example serves to highlight a key question for any study of gene-music relations, namely, how specific is the link between genes and the trait of interest? A related question is, what are the mechanisms linking genes to the trait of interest? For those interested in probing the links between genes, music, and evolution, these two questions must figure prominently in thinking about the relations between genetics and musical ability.
本节讨论引起遗传学家兴趣的两种音乐表型。第一种是乐音性耳聋,本章在不同的地方都提到过。对双胞胎的现代研究表明,特定基因使其中一个处于患上这种疾病的风险中。第二种表型是绝对音高 (AP),此处定义为能够准确在没有任何外部参考的情况下,将听到的音高分类为细粒度的类别(例如,使用西方音乐的 12 个音符名称,如 A 和C♯ )。有一些暗示性的证据表明,虽然还不是决定性的,但基因在决定哪些人患有 AP 方面发挥了作用,目前正在进行基因研究。
This section discusses two musical phenotypes that have attracted interest from geneticists. The first is musical tone deafness, which has been mentioned at various points throughout this chapter. Modern research on twins has suggested that specific genes put one at risk for this disorder. The second phenotype is absolute pitch (AP), defined here as the ability to accurately classify heard musical pitches into fine-grained categories (e.g., using the 12 note names of Western music, such as A and C♯) without any external reference. There is suggestive, though as yet not conclusive, evidence that genes play a role in determining which individuals have AP, and genetic studies are currently underway.
这样的研究能帮助阐明进化问题吗?为了将这个问题放在更广泛的行为遗传学研究背景下,我将首先讨论语言遗传学的研究。这项研究有助于突出遗传学和音乐研究也必须面对的问题。它还包含一个重要的教训,即关于人类基因-行为联系的流行描述(通常意味着基因与高级认知功能之间的简单一对一映射)与在更深入地研究现象时出现的复杂现实之间的差异。细节。
Can such studies help shed light on evolutionary issues? To put this question in the broader context of research on behavioral genetics, I will first discuss research on the genetics of language. This research helps highlight issues that studies of genetics and music must also face. It also contains an important lesson about the difference between popularized accounts of human gene-behavior links (which often imply simple one-to-one mappings between genes and high-level cognitive functions) and the complex reality that emerges when phenomena are studied in more detail.
语言的例子涉及一个英国家庭(“KE”家庭),该家庭提供了单个基因与言语和语言发育障碍之间的第一个明确联系。14这个家族的大约一半成员在 7 号染色体上的一个基因中发生了突变。这个基因已经被测序,被称为 FOXP2,属于简单遗传。它是一种常染色体显性遗传,意味着一个受损的拷贝会导致这种疾病。15对该障碍的早期描述主要集中在受影响的家庭成员的语法问题上(Gopnik,1990;cf. Gopnik & Crago,1991),引发了媒体的广泛关注和对“语法基因”的猜测。然而,现在很清楚,受影响的成员有广泛的言语和语言缺陷,包括口面部运动障碍(即控制协调的面部和嘴部运动的问题)、区分真实单词和非单词的困难以及操纵音素的困难(Vargha- Khadem 等人,1995 年;Alcock 等人,2000a;Watkins 等人,2002 年)。此外,尽管受影响个体的语言智商比非语言智商受到的影响更大,但这些个体在非语言智商测试中的得分低于未受影响的家庭成员(平均而言;分布有重叠,一些受影响的成员在正常范围内)。音乐认知研究人员特别感兴趣的是,已经证明受影响的成员在音调测试中与未受影响的成员一样好,但在音乐节奏测试中明显更差(Alcock 等人,2000b)。(从从现代音乐认知的角度来看,阿尔科克使用了非常简单的音高和节奏能力测试;要确定受影响家庭成员与未受影响家庭成员的音乐能力特征,还有很多工作要做。)
The example from language concerns a British family (the “KE” family) that provided the first clear link between a single gene and a developmental disorder of speech and language.14 About half of the members of this family have a mutation in a gene on chromosome 7. This gene, which has been sequenced and is known as FOXP2, is subject to simple inheritance. It is an autosomal dominant, meaning that one damaged copy leads to the disorder.15 An early description of the disorder focused on problems that affected family members have with grammar (Gopnik, 1990; cf. Gopnik & Crago, 1991), triggering a great deal of media attention and speculation about a “grammar gene.” It is now clear, however, that affected members have a broad spectrum of speech and language deficits, including orofacial dyspraxia (i.e., problems controlling coordinated face and mouth movements), difficulties in distinguishing real words from nonwords, and difficulties manipulating phonemes (Vargha-Khadem et al., 1995; Alcock et al., 2000a; Watkins et al., 2002). Furthermore, although verbal IQ suffers more than nonverbal IQ in affected individuals, these individuals score lower than do unaffected family members on nonverbal IQ tests (on average; there is overlap in the distributions, with some affected members being in the normal range). Of special interest to researchers in music cognition, it has been demonstrated that affected members are as good as unaffected members on tests of musical pitch, but significantly worse on tests of musical rhythm (Alcock et al., 2000b). (From the standpoint of modern music cognition, Alcock used very simple tests of pitch and rhythm abilities; a great deal more remains to be done to characterize the musical abilities of affected vs. unaffected family members.)
因此,认为受影响的 KE 家族成员具有特定语言缺陷的观点是站不住脚的。这回答了我们的第一个关键问题,“基因与感兴趣特征之间的联系有多具体?” 然而,对这个问题的否定回答并不能给我们太多启示。我们想知道的是,受影响成员中出现的一系列赤字是否可以归因于共同的潜在赤字。这立即引出了我们的第二个关键问题,“将基因与感兴趣的性状联系起来的机制是什么?” 众所周知,FOXP2 编码一种 DNA 结合蛋白,因此是一种调节其他基因的基因(参见 Marcus & Fisher,2003 年的综述)。该基因出现在多种物种中,包括黑猩猩、小鼠、鸟类和鳄鱼,对其他物种的研究开始提供有关 FOXP2 功能的一些线索。特别是,对学习鸣叫和不学习鸣叫的鸟类中 FOXP2 表达的研究表明,特定大脑区域中的 FOXP2 表达与歌曲序列的学习有关(Haesler 等人,2004 年;参见 Teramitsu 等人,2004 年)。16分子生物学研究表明,基底神经节是 FOXP2 表达的重要位点(Lai 等人,2003 年;Haesler 等人,2004 年)。基底神经节是皮层下结构,涉及运动控制和排序,以及高级认知功能,如句法处理(DeLong,2000;Kotz 等人,2003;图 7.1)。
Thus the notion that affected members of the KE family have a specific language deficit is untenable. This answers our first key question, “How specific is the link between genes and the trait of interest?” However, a negative answer to this question does not teach us much. What we would like to know is whether the spectrum of deficits seen in affected members can be attributed to a common underlying deficit. This leads immediately to our second key question, “What are the mechanisms linking genes to the trait of interest?” It is known that FOXP2 codes for a DNA-binding protein, and is therefore a gene that regulates other genes (see Marcus & Fisher, 2003 for a review). The gene occurs a wide variety of species, including chimpanzees, mice, birds, and crocodiles, and studies in other species are beginning to provide some clues about what FOXP2 does. In particular, studies of FOXP2 expression in birds that do versus do not learn their song suggest that FOXP2 expression in specific brain regions is related to learning of song sequences (Haesler et al., 2004; cf. Teramitsu et al., 2004).16 Molecular biology research has shown that the basal ganglia are an important site for FOXP2 expression (Lai et al., 2003; Haesler et al., 2004). The basal ganglia are subcortical structures involved in motor control and sequencing, as well as in higher cognitive functions such a syntactic processing (DeLong, 2000; Kotz et al., 2003; Figure 7.1).
受影响的 KE 家族成员的结构脑成像显示基底神经节和其他区域存在异常(Vargha-Khadem 等人,1998),功能性脑成像显示基底神经节是在语言任务中未激活的区域之一(Liégeois 等人,2003 年)。鉴于受影响的成员在口头动作和非语言音乐节奏模式方面都存在复杂的时间顺序缺陷,人们想知道潜在的缺陷是否是一种精细的顺序和时间安排。例如,人们可能会认为 FOXP2 以某种方式影响神经元的时间特性,进而影响神经网络动力学。因此,对该基因的损害可能会使网络难以处理复杂的测序任务。一个重要的问题是,这种缺陷是否会影响从运动到认知的多个层面的语言能力发展。事实上,最近神经语言学的研究和理论化已经将基底神经节与运动和认知语言操作联系起来 (Lieberman, 2000, Ch. 4)。(正如利伯曼指出的那样,基底神经节的大部分投射到皮层的非运动区。)
Structural brain imaging of the affected KE family members has revealed abnormalities in the basal ganglia, among other regions (Vargha-Khadem et al., 1998), and functional brain imaging reveals that the basal ganglia are among the regions that are underactivated during language tasks (Liégeois et al., 2003). Given the deficits that affected members have with complex temporal sequencing in both oral movements and in nonverbal musical rhythmic patterns, one wonders whether the underlying deficit is one of fine sequencing and timing. For example, one might imagine that FOXP2 somehow affects the temporal properties of neurons, which in turn influences neural network dynamics. Damage to this gene could thus make it difficult for networks to handle complex sequencing tasks. An important question is whether such deficits could influence the development of language abilities at multiple levels, from motor to cognitive. In fact, recent research and theorizing in neurolinguistics has implicated the basal ganglia in both motor and cognitive linguistic operations (Lieberman, 2000, Ch. 4). (As Lieberman notes, large parts of the basal ganglia project to nonmotor areas of cortex.)
图 7.1显示基底神经节在人脑中的位置的示意图。基底神经节是对语言和音乐都很重要的皮层下结构(参见第7.3.4节和第7.5.3节),并且与大脑皮层的大面积区域有相互联系。(另外两个皮质下结构的位置,丘脑和杏仁核,也显示为解剖学参考。)
Figure 7.1 Schematic diagram showing the location of the basal ganglia in the human brain. The basal ganglia are subcortical structures important to both language and music (cf. sections 7.3.4 and 7.5.3), and have reciprocal connections to large areas of the cerebral cortex. (The location of two other subcortical structures, the thalamus and the amygdala, are also shown for anatomical reference.)
总之,对 KE 家族的研究帮助科学家们完善了他们所说的“语言”基因的含义。尽管 FOXP2 对于语言的正常发展至关重要,但它的作用并不特定于语言。因此,真正的科学问题是关于机制,换句话说,就是发现这个基因在语言处理中所涉及的回路发展中所起的作用。更一般地说,FOXP2 研究帮助我们概念化基因和行为之间的复杂联系。这很少(如果有的话)是基因、神经回路和行为之间简单的一对一映射问题。相反,图 7.2中的图片更能代表生物系统的工作方式(Greenspan,1995;2004;Balaban,2006)。
In summary, studies of the KE family have helped scientists refine what they mean by a gene “for language.” Although FOXP2 is crucial for the normal development of language, its effects are not specific to language. The real scientific question is thus about mechanism, in other words, in discovering the role this gene plays in the development of circuits involved in language processing. More generally, FOXP2 studies help us conceptualize the complex link between genes and behaviors. Rarely, if ever, is this a matter of simple one-to-one mappings between genes, neural circuits, and behaviors. Instead, the picture in Figure 7.2 is more representative of how biological systems work (Greenspan, 1995; 2004; Balaban, 2006).
音乐的教训是要谨慎对待任何有关基因(或基因)对音乐有特定影响的报道。重要的是要问还有哪些其他能力也受到影响,以及什么机制导致观察到的效果。只有这样才能解决基因-行为联系的进化意义。就 FOXP2 而言,该基因在人类版本中缺乏序列变异表明该基因已成为选择目标。然而,了解 FOXP2 的作用会深刻影响我们对进化问题的理解。因此,例如,如果 FOXP2 参与构建执行复杂排序操作(运动和认知)的电路,并且如果语音、语言和音乐节奏都利用这些操作,那么这些能力中的任何一个都可能成为目标选择。
The lesson for music is to be cautious about any report that a gene (or genes) has a specific effect on music. It is essential to ask what other abilities are also influenced and what mechanism leads to the observed effect. Only then can one address the evolutionary significance of the gene-behavior link. In the case of FOXP2, the lack of sequence variation in the human version of this gene suggests that this gene has been a target of selection. However, knowing what FOXP2 does profoundly influences our understanding of evolutionary issues. Thus, for example, if FOXP2 is involved in building circuits that do complex sequencing operations (both motor and cognitive), and if speech, language, and musical rhythm all draw on these operations, then any of these abilities could have been the target of selection. One must therefore rely on other evidence (such as the lack of biological cost for musical deficits) in order to decide which of these was a direct target of selection and which “came along for the ride.”
图 7.2基因、神经回路和行为之间关系的概念图。基因和神经回路之间,以及回路和行为之间的映射往往是多对多的,而不是一对一的。此外,基因之间和电路之间存在重要的相互作用。来自格林斯潘和塔利,1994 年。
Figure 7.2 Conceptual diagram of the relations between genes, neural circuits, and behaviors. The mapping between genes and neural circuits, and between circuits and behaviors, tends to be many to many rather than one to one. Furthermore, there are important interactions between genes and between circuits. From Greenspan & Tully, 1994.
在 1800 年代后期,艾伦 (Allen, 1878) 描述了一个人,尽管他受过良好的教育并且在童年接受过音乐课程,但他无法识别熟悉的旋律,无法演奏曲调,也无法辨别钢琴音调中的粗调音高变化。艾伦称这种情况为“音符失聪”。125 年后,音乐音调性耳聋受到了新的研究兴趣,现代认知科学和神经科学的工具得到了应用(Drayna 等人,2001 年;Ayotte 等人,2002 年;Foxton 等人,2004 年;Hyde 等人等人,2004 年)。阿约特等人。(2002) 对患有音调性耳聋的个体进行了一项有用的小组研究,他们将其称为“先天性失乐症”。(其他研究人员使用了其他术语,例如音调失聪和音律障碍。我更喜欢音调性耳聋,并将使用首字母缩略词 mTD 表示“音调性耳聋”,mTDIs 表示“音乐性音调性聋人”)。阿约特等人。通过广告招募 mTDI,并选择那些具有“(i) 高学历,最好是大学水平,以排除一般学习的人残疾或发育迟缓;(ii) 儿童时期的音乐课,以确保及时接触音乐;(iii) 从他们的记忆中可以追溯到音乐失败的历史,以增加这种疾病是先天性的可能性;(iv) 没有以前的神经或精神病史来排除明显的神经影响原因。” 研究人员在各种音乐任务中测试了 mTDI,例如旋律辨别和敲击音乐节拍,并发现了严重的损伤。特别是一项测试明确区分了 mTDI 和控制对象。在这个测试中,参与者一次听到一段简短的旋律,并指出其中是否包含“错误的音符”。(错误的音符是通过原始音符以保留旋律轮廓的方式移动 1 个半音而产生的跑调或“酸”音符。) mTDIs 在这项任务中严重受损,无论使用熟悉还是不熟悉的旋律(尽管像控制一样,他们在熟悉的旋律上做得更好)。阿约特等人。继续表明 mTD 确实对音乐有选择性。例如,对非音乐声音的识别和对语音语调的感知似乎没有受到影响。
In the late 1800s, Allen (1878) described an individual who could not recognize familiar melodies, carry a tune, or discriminate gross pitch changes in piano tones, though he was well educated and had received music lessons in childhood. Allen called this condition “note-deafness.” Over 125 years later, musical tone deafness is receiving new research interest, with tools from modern cognitive science and neuroscience being brought to bear (Drayna et al., 2001; Ayotte et al., 2002; Foxton et al., 2004; Hyde et al., 2004). Ayotte et al. (2002) conducted a useful group study of individuals with musical tone deafness, which they term “congenital amusia.” (Other researchers have used other terms, such as tune-deafness and dysmelodia. I prefer musical tone deafness and will use the acronyms mTD for “musical tone deafness” and mTDIs for “musically tone deaf individuals”). Ayotte et al. recruited mTDIs via advertisement, and selected those who had “(i) a high level of education, preferably university level, to exclude general learning disabilities or retardation; (ii) music lessons during childhood, to ensure exposure to music in a timely fashion; (iii) a history of musical failure that goes back as far as they could remember, to increase the likelihood that the disorder is inborn; and (iv) no previous neurological or psychiatric history to eliminate an obvious neuro-affective cause.” The researchers tested the mTDIs on a variety of musical tasks, such as melody discrimination and tapping to a musical beat, and found severe impairments. One test in particular made a clean discrimination between mTDIs and control subjects. In this test, participants heard one brief melody at a time and indicated whether it contained a “wrong note” or not. (Wrong notes were out-of-key or “sour” notes created by a 1-semitone shift of an original note in a manner that preserved the melodic contour.) The mTDIs were severely impaired on this task, whether familiar or unfamiliar melodies were used (though like controls, they did better on familiar melodies). Ayotte et al. went on to show that mTD did indeed seem selective to music. Recognition of nonmusical sounds and perception of speech intonation, for example, appeared to be spared.
正如本章前面提到的,Kalmus 和 Fry (1980) 表明 mTD 倾向于在家庭中遗传,并且最近的一项研究发现同卵双胞胎(他们共享 100% 的基因)在音聋测试中彼此更相似比异卵双胞胎(平均只有 50% 的基因相同)。这些发现表明存在一种(或多种)特定基因使人处于患这种疾病的风险中(Drayna 等人,2001 年)。那么,这是不是“音乐基因”的证据,暗示自然选择在音乐能力进化中的作用?
As noted earlier in this chapter, Kalmus and Fry (1980) showed that mTD tends to run in families, and a more recent study found that identical twins (who share 100% of their genes) resemble each other more closely on tests of tone deafness than do fraternal twins (who share only 50% of their genes on average). These findings suggest that there is a specific gene (or genes) that puts one at risk for this disorder (Drayna et al., 2001). Is this, then, evidence for a “music gene,” which would imply a role for natural selection in the evolution of musical abilities?
鉴于我们已经了解了基因与语言行为之间的关系,一个关键问题是,对音乐的缺陷有多大选择性?许多研究(例如,Hyde & Peretz,2004;Foxton 等人,2004;Patel、Foxton 和 Griffiths,2005)表明,mTDI 在基本的音高变化检测和音高方向确定技能方面存在基本问题,外部音乐背景(见第 4 章,第 4.5.2 节, “音乐性耳聋的旋律轮廓感知”小节)。目前,我们不知道哪些基因使个体易患 mTD,或者哪些机制将基因与性状联系起来。然而,为了争论起见,让我们假设这些基因参与了与音高变化检测和音高方向辨别相关的听觉神经图的形成,并且这些基因的破坏导致连接模式的轻微变化在开发过程中这些地图中的神经元之间。这反过来可能会导致提高辨别音调何时变化以及变化方向的阈值。由于这些升高的阈值,个人将接收到环境音乐输入的降级版本,因此不会发展正常的音高认知表征(参见第 5 章)。相比之下,语音语调感知可能在很大程度上对这种缺陷具有很强的鲁棒性,因为大多数感知相关的音高运动都高于这些阈值(参见第 4 章第4.5.2 节,“旋律轮廓耳聋假设”小节进一步讨论)。
Given what we have learned about the relation between genes and behavior in language, a key question is, how selective is the deficit to music? A number of studies (e.g., Hyde & Peretz, 2004; Foxton et al., 2004; Patel, Foxton, & Griffiths, 2005) have shown that mTDIs have basic problems in basic pitch-change detection and pitch-direction determination skills, outside of a musical context (see Chapter 4, section 4.5.2, subsection “Melodic Contour Perception in Musical Tone Deafness”). At present, we do not know which genes predispose individuals to mTD, or what mechanisms link the genes to the trait. However, for the sake of argument, let us assume that these genes are involved in the formation of auditory neural maps involved in pitch-change detection and pitch-direction discrimination, and that disruption of these genes leads to a slight change in patterns of connectivity between neurons in these maps during development. This in turn could lead to elevated thresholds for discerning when pitch changes and the direction in which it changes. Due to these elevated thresholds, individuals would receive a degraded version of the ambient musical input, so that normal cognitive representations of musical pitch would not develop (cf. Chapter 5). In contrast, speech intonation perception may be largely robust to this deficit, because most perceptually relevant pitch movements are above these thresholds (cf. Chapter 4, section 4.5.2, subsection “The Melodic Contour Deafness Hypothesis,” for further discussion).
我怀疑,就像语言和 FOXP2 的情况一样,对 mTD 的基因研究会发现一个(或多个)基因,它不特定于音乐能力,但对正常的音乐发展至关重要。只有了解基因的机制后,才能提出有意义的进化问题。然而,鉴于我们目前对 mTD 中基本音高处理缺陷的理解,这种疾病及其遗传基础似乎无法提供任何证据表明人类已经被音乐的自然选择所塑造。
I suspect that just as with the case of language and FOXP2, genetic research on mTD will turn up a gene (or genes) that is not specific to musical ability but that is crucial for normal musical development. Only after the mechanism of the gene is understood can evolutionary questions be meaningfully asked. Given our current understanding of the basic pitch-processing deficits in mTD, however, it does not seem that this disorder and its genetic basis will provide any evidence that humans have been shaped by natural selection for music.
很少有音乐能力像绝对音高 (AP) 那样在存在与不存在方面表现得如此明确,此处定义为将听到的音高准确分类为细粒度类别的能力(例如,使用西方音乐的 12 个音符名称,如 A 和♯ _) 没有任何外部参考。例如,具有 AP(并受过西方音乐训练)的音乐家可以毫不费力地快速说出所听到的各个音高的名称(或“色度”):升 C、降 A、G 等。相比之下,非 AP 音乐家必须依靠相对音高来命名音符,换句话说,他们必须衡量目标音高与某个已知标准的距离,这通常是一项更费力且更容易出错的工作。AP 似乎是一种近乎神奇的能力,特别是因为 AP 音乐家经常回忆起他们并没有付出特别的努力来获得它。因此,难怪 AP 有时被认为是一种天赋,只是从基因等偶然因素中获得。
Few musical abilities appear as categorical in terms of presence versus absence as absolute pitch (AP), defined here as the ability to accurately classify heard pitches into fine-grained categories (e.g., using the 12 note names of Western music, such as A and C♯) without any external reference. For example, a musician with AP (and trained in Western music) can effortlessly and rapidly call out the names (or “chroma”) of individual pitches as they are heard: C-sharp, A-flat, G, and so forth. In contrast, non-AP musicians must rely on relative pitch to name notes, in other words, they must gauge the distance of the target pitch from some known standard, typically a much more effortful and error-prone endeavor. AP can seem an almost magical ability, particularly because AP musicians often recall making no special effort to acquire it. Little wonder, then, that AP is sometimes thought of as a gift that one simply gets from chance factors such as genes.
AP 已被研究超过 120 年(Stumpf,1883 年),并且比乐音性耳聋受到更多的研究关注。因此,我们对 AP 表型了解很多(参见 Takeuchi & Hulse,1993;Ward,1999;Levitin & Rogers,2005,评论)。例如,众所周知,AP 音乐家没有比非 AP 音乐家更好的频率分辨率能力(Hyde 等人,2004),并且没有证据表明 AP 拥有者通常是优秀的音乐家。(有许多伟大的作曲家没有 AP,例如瓦格纳和斯特拉文斯基。)事实上,AP 在某些神经系统疾病中的发病率似乎较高,例如失明、威廉姆斯综合症和自闭症(Chin,2003 年)。因此,AP 不应与音乐天赋相混淆,尽管 AP 当然可以成为音乐家的资产。
AP has been investigated for over 120 years (Stumpf, 1883) and has received much more research attention than musical tone deafness. Thus we know a good deal about the AP phenotype (see Takeuchi & Hulse, 1993; Ward, 1999; and Levitin & Rogers, 2005, for reviews). For example, it is known that AP musicians do not have better frequency resolution abilities than non-AP musicians (Hyde et al., 2004), and there is no evidence to suggest that AP possessors are superior musicians in general. (There are many great composers who did not have AP, such as Wagner and Stravinsky.) In fact, AP appears to have an elevated incidence in certain neurological disorders, such as blindness, Williams syndrome, and autism (Chin, 2003). Thus AP should not be confused with musical giftedness, though of course AP can be an asset to a musician.
为什么 AP 被认为与基因有关?这个假设基于两个发现。首先,Profita 和 Bidder (1988) 报告了 AP 的家族聚集,这一发现已被后续工作复制(Baharloo 等人,1998 年,2000 年)。其次,基于对美国音乐学生的大规模调查,Gregersen 及其同事 (1999, 2000) 表明亚洲学生(日本、韩国和中国)的 AP 发生率明显高于西方学生(即, ~ 45% vs. 10% 的音乐学生对调查做出了回应,尽管参见 Henthorn & Deutsch 等人,2007)。作者还发现,亚洲人更有可能参与鼓励AP 的发展,但得出结论,培训的差异不能完全解释他们观察到的不对称性。17
Why is AP thought to have a relationship to genes? This supposition rests on two findings. First, Profita and Bidder (1988) reported familial aggregation of AP, a finding that has been replicated by subsequent work (Baharloo et al., 1998, 2000). Second, based on large surveys of music students in the United States, Gregersen and colleagues (1999, 2000) have shown that the incidence of AP is significantly higher in Asian students (Japanese, Korean, and Chinese) than in Western students (i.e.,~ 45% vs. 10% of music students who responded to the survey, though see Henthorn & Deutsch et al., 2007). The authors also found that Asians were more likely to engage in programs that encouraged the development of AP, but concluded that differences in training could not fully account for the asymmetries they observed.17
当然,这些都不是基因影响 AP 发展的直接证据,目前正在努力尝试确定地回答这个问题(Gitschier 等人,2004 年)。然而,很明显,即使基因确实在 AP 中发挥作用,环境也起着非常重要的作用。两个显着的环境因素是开始音乐训练的年龄(参见第 7.3.2 节)和音乐训练的类型(Gregersen 等人,2000)。在这方面,AP 可能与乐音性耳聋截然不同,因为后者可能不受经验影响 (Hyde & Peretz, 2004)。
Of course, none of this is direct evidence that genes influence the development of AP, and work is currently underway to try and try to answer this question with certainty (Gitschier et al., 2004). It is evident, however, that even if genes do play a role in AP, environment also plays an very important role. Two salient environmental factors are the age of onset of musical training (cf. section 7.3.2) and type of musical training (Gregersen et al., 2000). AP may differ quite sharply from musical tone deafness in this respect, because the latter may be impervious to experience (Hyde & Peretz, 2004).
AP 表型的罕见性,以及它对于正常音乐能力的发展是不必要的这一事实表明,为 AP 寻找遗传基础并不能提供证据表明人类是由音乐的自然选择塑造的。然而,AP 可能会证明是研究人类基因型-表型相互作用的有趣模型 (Zatorre, 2003)。该领域未来的工作将受益于寻找创造性的方法来测试非音乐家的 AP。虽然 AP 通常通过能够命名音高来证明,但记忆和分类音高的能力应该在概念上与给它们贴上音乐标签的能力区分开来。例如,罗斯等人。(2003) 在通常将 AP 与非 AP 个体分开的音调感知任务的背景下报告非音乐家的类似 AP 的能力。在这个任务中,听众听到一个音高(目标)、一组插值音高,然后是另一个音高(探测),并且必须确定探测和目标是否相同。有 AP 的人发现这项任务很容易,因为他们可以简单地比较目标和试探的语言标签。没有 AP 的人会发现这项任务非常困难,因为内插音调会干扰音高记忆 (Deutsch, 1978)。罗斯找到了一位年轻的非音乐家,尽管他对音调的标准音符名称缺乏了解,但他在这项任务上的表现却像 AP 人。(对于可用于测试非音乐家 AP 的另一种方法,请参见 Weisman 等人,2004 年。)有 AP 的人发现这项任务很容易,因为他们可以简单地比较目标和试探的语言标签。没有 AP 的人会发现这项任务非常困难,因为内插音调会干扰音高记忆 (Deutsch, 1978)。罗斯找到了一位年轻的非音乐家,尽管他对音调的标准音符名称缺乏了解,但他在这项任务上的表现却像 AP 人。(对于可用于测试非音乐家 AP 的另一种方法,请参见 Weisman 等人,2004 年。)有 AP 的人发现这项任务很容易,因为他们可以简单地比较目标和试探的语言标签。没有 AP 的人会发现这项任务非常困难,因为内插音调会干扰音高记忆 (Deutsch, 1978)。罗斯找到了一位年轻的非音乐家,尽管他对音调的标准音符名称缺乏了解,但他在这项任务上的表现却像 AP 人。(对于可用于测试非音乐家 AP 的另一种方法,请参见 Weisman 等人,2004 年。)尽管缺乏对音调的标准音符名称的了解。(对于可用于测试非音乐家 AP 的另一种方法,请参见 Weisman 等人,2004 年。)尽管缺乏对音调的标准音符名称的了解。(对于可用于测试非音乐家 AP 的另一种方法,请参见 Weisman 等人,2004 年。)
The rarity of the AP phenotype, and the fact that it is unnecessary for the development of normal musical ability, indicate that finding a genetic underpinning for AP would provide no evidence that humans have been shaped by natural selection for music. However, it may be that AP will prove an interesting model for studying genotype-phenotype interactions in humans (Zatorre, 2003). Future work in this area will benefit from finding creative ways to test for AP in nonmusicians. Although AP is usually demonstrated by being able to name pitches, the ability to remember and classify pitches should be conceptually distinguished from the ability to give them musical labels. For example, Ross et al. (2003) report AP-like abilities in a nonmusician in the context of a pitch perception task that usually separates AP from non-AP individuals. In this task, a listener hears one pitch (the target), a group of interpolated pitches, and then another pitch (the probe), and must decide if the probe and target are the same or not. Individuals with AP find this task easy, because they can simply compare their verbal labels for the target and probe. Individuals without AP find the task very difficult, because of the interfering effect of the interpolated tones on pitch memory (Deutsch, 1978). Ross found a young nonmusician who performed like AP individuals on this task, despite a lack of knowledge of the standard note names of the tones. (For another method that could be used to test AP in nonmusicians, see Weisman et al., 2004.)
一些非音乐家可能拥有 AP 的观点得到以下事实的支持:一种简单形式的 AP 基于对熟悉曲调的绝对音高的记忆,在非音乐家中广泛存在 (Levitin, 1994; Schellenberg & Trehub, 2003)。此外,非 AP 音乐家在 AP 测试中表现出色,这表明 AP 和非 AP 之间的区别可能不像通常认为的那样明确(Lockhead & Byrd,1981;尽管很难排除使用非 AP 音乐家在这些测试中的相对音高提示)。假设对于鉴于获得音乐 AP 的可能性非常普遍,行为遗传学面临的挑战是解释为什么即使环境变量匹配(例如,开始音乐训练的年龄和训练类型),AP 在某些个体中发展而其他人却没有。例如,这是由于影响音调感知神经基质的基因 (Zatorre, 2003),还是由于影响“认知风格”的基因 (Brown et al., 2003; Chin, 2003)?尽管 AP 的行为遗传学无疑会证明是一项有趣的事业,但这里的相关点是这项研究与 AP 与进化问题的相关性问题是正交的。如上所述,这种相关性基本上为零。
The notion that some nonmusicians could have AP is bolstered by the fact that a simple form of AP based on memory for the absolute pitch of familiar tunes is widespread among nonmusicians (Levitin, 1994; Schellenberg & Trehub, 2003). Furthermore, non-AP musicians perform above chance on AP tests, suggesting that the distinction between AP and non-AP may not be as clear-cut as usually thought (Lockhead & Byrd, 1981; though it is difficult to rule out the use of relative pitch cues by non-AP musicians in these tests). Assuming for the moment that the potential to acquire musical AP is widespread, the challenge for behavior genetics is to explain why AP develops in some individuals but not others even when environmental variables are matched (e.g., age of onset of musical training and type of training). For example, is this due to genes that influence the neural substrates of pitch perception (Zatorre, 2003), or to genes that influence “cognitive style” (Brown et al., 2003; Chin, 2003)? Although the behavior genetics of AP will doubtless prove an interesting enterprise, the relevant point here is that this research is orthogonal to the question of AP’s relevance to evolutionary issues. As noted above, this relevance is basically nil.
本章的介绍论证了动物交流,即使我们称之为“歌曲”,它也与音乐不同。然而,从进化的角度来看,动物是否表现出与人类相似的音乐倾向,或者是否可以学习人类音乐的特定方面,都是有趣的问题。因为动物没有音乐,所以它们表现出或可以学习的任何与音乐相关的行为显然不是音乐选择的结果,而是反映了听觉处理的更一般方面。因此,对动物的研究可以帮助阐明音乐处理的哪些方面是人类独有的,因此需要进化解释(McDermott & Hauser,2005)。
The introduction of this chapter argued that animal communication, even when we term it “song,” it not the same as music. However, whether or not animals exhibit musical predispositions akin to humans, or can learn specific aspects of human music, are interesting questions from an evolutionary standpoint. Because animals do not have music, whatever music-relevant behaviors they exhibit or can learn are clearly not a result of selection for music, but reflect more general aspects of auditory processing. Thus studies of animals can help shed light on what aspects of music processing are uniquely human and thus in need of evolutionary explanation (McDermott & Hauser, 2005).
完全相同的策略已用于言语和语言的动物研究。例如,Kuhl 和 Miller (1975) 的早期研究表明,龙猫对某些语音对比表现出分类感知(参见第 2 章,第 2.4.1 节)),一种以前被认为是人类独特的语音感知能力。最近,Hauser 等人。(2001) 表明,棉顶狨猴以类似于人类婴儿的方式显示出对音节转换概率的统计学习(Saffran 等人,1996 年;参见 Newport 等人,2004 年)。相比之下,Fitch 和 Hauser (2004) 表明,这些相同的猴子缺乏学习递归结构序列的能力,这一发现与递归是与语言相关的人类独特能力的假设有关(参见 Hauser 等人,2002 年;但见 Gentner et al., 2006,有关挑战这一假设的鸟类数据)。
Exactly this same strategy has been used with animal studies of speech and language. For example, early research by Kuhl and Miller (1975) showed that chinchillas exhibited categorical perception for certain speech sound contrasts (cf. Chapter 2, section 2.4.1), an ability that was previously thought be a uniquely human adaptation for speech perception. More recently, Hauser et al. (2001) have shown that cotton-top tamarin monkeys show statistical learning of syllable transition probabilities in a manner akin to human infants (Saffran et al., 1996; cf. Newport et al., 2004). In contrast, Fitch and Hauser (2004) showed that these same monkeys lack the ability to learn recursively structured sequences, a finding related to the hypothesis that recursion is a uniquely human ability related to language (cf. Hauser et al., 2002; but see Gentner et al., 2006, for data on birds that challenge this hypothesis).
在接下来的部分中,我将简要回顾动物和音乐研究的三个领域,并讨论这项工作对进化问题的意义。(这些部分侧重于音高;有关音乐节奏的动物能力的讨论留给第 7.5.3 节。)在开始之前,值得注意的是,动物对音乐的欣赏不乏轶事证据。这是我最喜欢的例子,它出现在 2004 年夏天一位语言学家的一封信中:
In the following sections, I briefly review three areas of research on animals and music, and discuss the significance of this work to evolutionary issues. (These sections focus on pitch; a discussion of animal abilities with regard to musical rhythm is reserved for section 7.5.3.) Before embarking, it is worth noting that there is no shortage of anecdotal evidence for animal appreciation of music. Here is my favorite example, which came in a letter from a linguist in the summer of 2004:
我已故的爱尔兰塞特犬对贝多芬和舒伯特室内乐的无线电广播反应最为显着,尽管对贝多芬作品的反应更多中期比他晚期(我更喜欢)。结果,我开始更仔细地聆听中期的作品,并且比以前更加欣赏它们。我认为,一个人的审美感受力因养狗而显着增强的情况相对较少。
My late Irish setter responded most remarkably to radio broadcasts of chamber music by Beethoven and Schubert, though more to the works of Beethoven’s middle period than those of his late period (which I had preferred). In consequence I started listening more closely to the works of the middle period and came to appreciate them more than I previously had. It is relatively rare, I think, to have one’s aesthetic sensibilities significantly enhanced by one’s dog.
我敢肯定动物音乐故事不乏其人(尽管也许没有一个像这个那样迷人),但正如科学格言所说,“轶事的复数不是数据。”
I’m sure there is no shortage of animal music stories (though perhaps none as charming as this one), but as the scientific adage goes, “the plural of anecdote is not data.”
对动物音高感知的研究表明,绝对音高并不是人类独有的能力。事实上,它似乎是几种鸟类的首选加工模式(参见 Hulse 等人,1992 年的综述)。在这些研究中,设计了各种不需要口头标签的 AP 测试方法。在一项研究中,韦斯曼等人。(2004) 使用了一种范例,其中大约 5,000 Hz 的频率范围被分为八个大小相等的频段,每个频段有五个测试音。鸟类(斑胸草雀和其他两种物种)在响应四个频段的频率时获得正强化,在响应其他频段的频率时获得负强化。训练后,鸟类表现出极好的辨别力,主要对积极强化区域的音调做出反应。因此,他们在没有外部参考的情况下成功地将听到的音高分类为细粒度类别,满足了我们对 AP 的定义。(人类非音乐家也接受了这种程序的训练,以金钱作为强化,但在任务中表现得很糟糕。)这一发现的意义在于它表明 AP 不依赖于人类特定的大脑能力,因此不太可能反映音乐的选择(cf.第 7.3.4 节,“遗传学和绝对音高”小节)。
Research on animal pitch perception has revealed that absolute pitch is not a uniquely human ability. Indeed, it seems to be the preferred processing mode of several bird species (see Hulse et al., 1992 for a review). In these studies, various ways of testing AP have been devised that do not require verbal labels. In one study, Weisman et al. (2004) used a paradigm in which a frequency range of about 5,000 Hz was divided into eight equal-sized bands with five test tones per band. Birds (zebra finches and two other species) received positive reinforcement for responding to frequencies in four of the bands and negative reinforcement for responding to frequencies in the other bands. After training, birds showed extremely good discrimination, responding predominantly to tones in the positively reinforced regions. Thus they successfully classified heard pitches into fine-grained categories without an external reference, fulfilling our definition of AP. (Human nonmusicians were also trained using this procedure, with money as the reinforcement, but performed miserably on the task.) The significance of this finding is that it shows that AP does not depend on human-specific brain abilities, and is thus unlikely to reflect selection for music (cf. section 7.3.4, subsection “Genetics and Absolute Pitch”).
正如 McDermott 和 Hauser (2005) 所指出的,与人类相比,非人类动物的 AP 设施可能更引人注目的是它们缺乏相对音高的设施。尽管动物可以使用相对音高作为提示(例如,判断音调序列是上升还是下降,而与其绝对音高范围无关;Brosch 等人,2004),但这通常需要大量训练。这与人类的感知形成鲜明对比。即使是婴儿也很容易识别出不同音高范围内相同旋律轮廓的相似性 (Trehub et al., 1984),并记住基于相对而非绝对音高模式的旋律 (Plantinga & Trainor, 2005)。这表明自然选择可能已经修改了人类听觉系统以支持相对音高处理,这种修改可能起源于语音语调感知(参见第 7.3.3 节,“音乐和婴儿期:特异性问题”小节)。
As noted by McDermott and Hauser (2005), perhaps more remarkable than nonhuman animals’ facility with AP is their lack of facility with relative pitch compared with humans’. Although animals can use relative pitch as a cue (e.g., to judge if a tone sequence is rising or falling independent of its absolute pitch range; Brosch et al., 2004), this often requires extensive training. This contrasts sharply with human perception. Even infants readily recognize the similarity of the same melodic contours in different pitch ranges (Trehub et al., 1984), and remember melodies based on patterns of relative rather than absolute pitch (Plantinga & Trainor, 2005). This suggests that natural selection may have modified the human auditory system to favor relative pitch processing, a modification that may have its origins in speech intonation perception (cf. section 7.3.3, subsection “Music and Infancy: Questions of Specificity”).
动物研究表明,区分辅音和不和谐音程的能力并非人类独有。例如,和泉(2000)训练日本猕猴区分八度音程的辅音音程和不协和的大七度音程,然后使用转移测试表明猴子可以将这种辨别力推广到新的辅音音程与不协和音程。(参见 Hulse 等人,1995 年)(请注意,本节中讨论的所有研究都涉及由复杂音调构成的同时音程,换句话说,具有基音和高次谐波的音调。请参阅第 2 章,附录 3用于复杂音调中的和谐和不和谐的背景。)Izumi 指出,由于猴子的听觉系统不是为了音乐欣赏而进化的,因此区分和谐与不和谐的能力可能反映了与将传入声音分离为听觉一部分的声源相关的感知组织机制场景分析(参见 Bregman,1990)。Fishman 及其同事 (2001) 的神经研究进一步证明,协调与不协调的感知并非人类独有,他们表明,协调和不协调的间隔在猴子听觉皮层中产生了性质不同的神经活动模式。因此,我们认为某些音程本质上比其他音程听起来更粗糙(或音调不太统一)的感觉很可能不是人类独有的,
Animal studies have shown that the ability to discriminate consonant from dissonant pitch intervals is not unique to humans. For example, Izumi (2000) trained Japanese macaques to discriminate the consonant interval of an octave from a dissonant major seventh, and then used a transfer test to show that the monkeys could generalize this discrimination to novel consonant versus dissonant intervals. (cf. Hulse et al., 1995) (Note that all studies discussed in this section involved simultaneous intervals constructed from complex tones, in other words, tones with a fundamental and upper harmonics. See Chapter 2, appendix 3 for background on consonance and dissonance in complex tones.) Izumi noted that because the monkey auditory system did not evolve for music appreciation, the ability to discriminate consonance from dissonance may reflect mechanisms of perceptual organization related to segregating incoming sounds into sources as part of auditory scene analysis (cf. Bregman, 1990). Further evidence that the perception of consonance versus dissonance is not unique to humans comes from neural research by Fishman and colleagues (2001), who have shown that consonant and dissonant intervals generate qualitatively different patterns of neural activity in monkey auditory cortex. Thus it is likely that our sense that certain intervals are inherently rougher sounding than others (or have a less unified sense of pitch) is not uniquely human, but is a result of general properties of the vertebrate auditory system.
动物也许能够区分和谐与不和谐,但它们更喜欢其中一种吗?2 个月大的人类婴儿表现出对辅音间隔的偏爱(Trainor、Tsang 和 Cheung,2002 年),这引发了一个问题,即这是否是人类特有的特征。McDermott 和 Hauser (2004) 用一个优雅的范例解决了这个问题。他们构建了一个 V 形迷宫,其中 V 形的每个臂都包含一个隐藏的音频扬声器。每个扬声器发出不同的声音。一只棉顶绢毛猴被释放到迷宫中,根据它选择坐在哪只手臂上,它会听到一种或另一种声音(猴子可以在迷宫的两只手臂之间自由来回移动)。McDermott 和 Hauser 首先使用大声和柔和的噪音来测试他们的方法。所有猴子大部分时间都呆在播放柔和噪音的一侧,这表明猴子可以使用迷宫来指示行为偏好。然后,研究人员进行了一项实验,其中两个说话者播放辅音或不协和音程序列(辅音音程是八度、五度和四度;不协和音程是小二音、三全音和小九度)。这一次,猴子们没有表现出偏好。这一发现与 2 个月大的人类偏好之间的对比表明,对协和音的偏好可能是人类独有的,这增加了我们的听觉系统被音乐选择塑造的可能性。然后,研究人员进行了一项实验,其中两个说话者播放辅音或不协和音程序列(辅音音程是八度、五度和四度;不协和音程是小二音、三全音和小九度)。这一次,猴子们没有表现出偏好。这一发现与 2 个月大的人类偏好之间的对比表明,对协和音的偏好可能是人类独有的,这增加了我们的听觉系统被音乐选择塑造的可能性。然后,研究人员进行了一项实验,其中两个说话者播放辅音或不协和音程序列(辅音音程是八度、五度和四度;不协和音程是小二音、三全音和小九度)。这一次,猴子们没有表现出偏好。这一发现与 2 个月大的人类偏好之间的对比表明,对协和音的偏好可能是人类独有的,这增加了我们的听觉系统被音乐选择塑造的可能性。
Animals may be able to discriminate consonance from dissonance, but do they prefer one to the other? 2-month old human infants show a preference for consonant intervals (Trainor, Tsang, & Cheung, 2002), raising the question of whether this is a human-specific trait. McDermott and Hauser (2004) addressed this question with an elegant paradigm. They constructed a V-shaped maze in which each arm of the V contained a single concealed audio speaker. Each speaker produced a different sound. A cotton-top tamarin monkey was released into the maze, and depending on which arm it chose to sit in, it heard one or the other sound (the monkey was free to move back and forth between the two arms of the maze). McDermott and Hauser first used a loud and a soft noise to test their method. All monkeys spent most of their time on the side that played the soft noise, showing that the monkeys could use the maze to indicate a behavioral preference. The researchers then ran an experiment in which the two speakers played sequences of consonant or dissonant intervals (consonant intervals were the octave, fifth and fourth; dissonant were minor seconds, tritones, and minor ninths). This time, the monkeys showed no preference. The contrast between this finding and the preferences of 2-month-old humans suggests that preference for consonance may be uniquely human, raising the possibility that our auditory system has been shaped by selection for music.
然而,在得出这个结论之前,还需要做大量的工作。首先,重要的是用黑猩猩复制灵长类动物的发现,黑猩猩与人类的关系比棉顶狨猴更密切(我们最后一个与黑猩猩和狨猴的共同祖先大约在 600 万年前对 4000 万年前)。其次,测试在交流中使用习得的复杂声音的动物(例如鸣禽)很重要。
A good deal more work is needed, however, before this conclusion could be reached. First, it would be important to replicate the primate findings with chimpanzees, who are much more closely related to humans than cotton-top tamarins (our last common ancestor with chimpanzees vs. tamarins was about 6 million vs. 40 million years ago). Second, it would be important to test animals that use learned, complex sounds in their communication, such as songbirds.
有初步证据表明鸣禽更喜欢辅音而不是不和谐的声音 (Watanabe & Nemoto, 1998),这表明这种偏好可能是适应丰富声学交流系统的听觉系统的副产品。最后,通过跨文化研究或控制接触研究(参见第 7.3.3 节,“音乐与婴儿期:先天性问题”小节) ,对人类婴儿的研究必须表明对协奏曲的偏好不是由于先前的接触”)。如果在进行了这些更严格控制的研究后,人类和其他动物之间的差异仍然存在,那么那些支持非适应主义音乐观的人就有责任解释人类偏爱协和音的起源。
There is preliminary evidence that songbirds prefer consonant over dissonant sounds (Watanabe & Nemoto, 1998), which would indicate that this preference may be a byproduct of an auditory system adapted to a rich acoustic communication system. Finally, research on human infants would have to show that the preference for consonance was not due to prior exposure, by doing cross-cultural research or controlled-exposure studies (cf. section 7.3.3, subsection “Music and Infancy: Questions of Innateness”). If the difference between humans and other animals still persists after these more tightly controlled studies are conducted, then the burden will be on those who favor a nonadaptationist view of music to explain the origins of a human preference for consonance.
有大量证据表明,西方文化中的成年人对有调旋律与无调旋律相比表现出更好的处理能力,换句话说,旋律遵循西方音乐中音阶和调的惯例(参见第 5 章)). 婴儿不会表现出同样的不对称性(例如,Trainor & Trehub,1992)。然而,当这些变化扰乱某些音高间隔(例如纯五度)时,它们确实在检测音高模式变化方面表现更好(Schellenberg & Trehub,1996)。年幼的婴儿也表现出八度等值,将以八度分隔的音调序列视为相似 (Demany & Armand, 1984)。这些发现,加上几乎普遍使用八度和五度来组织音阶,表明某些音程在人类听觉处理中具有有利的地位,这要么是由于听觉系统的先天特性,要么是由于早期对和声结构声音的体验(如演讲,参见第2章,附录3). 关于这些倾向自然会出现的一个问题是它们是否仅限于人类。
There is abundant evidence that adults in Western culture show superior processing of tonal versus atonal melodies, in other words, melodies that follow the conventions of scale and key in Western music (cf. Chapter 5). Infants do not show this same asymmetry (e.g., Trainor & Trehub, 1992). They do, however, perform better at detecting changes in pitch patterns when those changes disrupt certain pitch intervals such as the perfect fifth (Schellenberg & Trehub, 1996). Young infants also show octave equivalence, treating tone sequences separated by an octave as similar (Demany & Armand, 1984). These findings, combined with the near universal use of the octave and fifth in organizing musical scales, suggest that certain pitch intervals have a favored status in human auditory processing, either due to innate properties of the auditory system or to early experience with harmonically structured sounds (such as speech, cf. Chapter 2, appendix 3). A question that naturally arises about these predispositions is whether they are limited to humans.
赖特等人。(2000) 进行了一项研究,其中对两只恒河猴进行了测试,看它们是否将旋律的八度换位视为相似。该研究利用了一个巧妙的范例,该范例允许猴子表达他们是否认为两种刺激相同或不同。在这个范例中,一只猴子坐在一个小隔间里,它正前方的墙上有一个扬声器,两边的墙上都有一个扬声器。中央扬声器发出第一个声音,然后两侧扬声器发出第二个声音。如果第一个和第二个声音相同,那么触摸右边的扬声器会引发食物奖励。如果声音不同,则触摸左侧扬声器会产生奖励。
Wright et al. (2000) conducted a study in which two rhesus monkeys were tested to see if they treated octave transpositions of melodies as similar. The research capitalized on a clever paradigm that allowed the monkeys to express whether they perceived two stimuli as the same or different. In this paradigm, a monkey sits in a small booth with a speaker in the wall directly ahead of it, and a speaker in the wall to either side. A first sound is presented from the central speaker, followed by a second sound from both side speakers. If the first and second sounds are the same, then a touch to the right speaker elicits a food reward. If the sounds are different, a touch to the left speaker produces the reward.
使用这种方法,研究人员研究了倍频程等效性。如果一只猴子认为由一个八度音阶分隔的音高相似,它们将更有可能对一个八度音阶的移调做出“相同”的反应,而不是对其他移调。赖特等人。发现当西方音调旋律被用作刺激物时(例如,来自儿童的歌曲),猴子确实表现出八度等价的证据,将 1 或 2 个八度的换位视为更类似于原始旋律比 1/2 八度音程或 1.5 八度音程的移调。这为猴子的八度等值提供了证据,表明八度等值的音乐普遍性源于基本的听觉处理机制(参见 McKinney & Delgutte,1999)。(有趣的是,与人类研究同时进行的是,猴子并没有表现出与孤立音高的八度等值。)
Using this method, the researchers investigated octave equivalence. If a monkey perceives pitches separated by an octave as similar, they will be more likely to respond “same” to transpositions of an octave than to other transpositions. Wright et al. found that when Western tonal melodies were used as stimuli (e.g., from children’s songs), the monkeys did indeed show evidence of octave equivalence, treating transpositions of 1 or 2 octaves as more similar to the original melody than transpositions of 1/2 octave or 1.5 octaves. This provided evidence for octave equivalence in monkeys, suggesting that the musical universal of octave equivalence has its roots in basic auditory processing mechanisms (cf. McKinney & Delgutte, 1999). (Interestingly, and in parallel to human studies, the monkeys did not show octave equivalence to isolated pitches.)
赖特等人。还进行了另一个引起广泛关注的实验。他们使用新的调性旋律与无调性旋律测试了猴子的八度等效性,发现当使用无调性旋律时,猴子没有表现出八度等效性。18对某些人来说,这可能表明听觉系统天生就偏爱音调。然而,仔细检查他们的数据表明情况并非如此明确。八度被视为比 1/2 八度更相似音调和无调性旋律。此外,在没有调换旋律的简单同异辨别试验中,猴子对音调旋律表现出更好的辨别能力。这些发现表明,猴子发现无调性旋律比有调性旋律更难记住,这可能是由于在圈养环境中偶然接触到音乐所致。Hauser 和 McDermott (2003) 注意到猴子设施通常有电视,并且在他们自己的研究中表明猴子擅长统计学习,目前认为这是调性学习的重要机制 (Krumhansl, 1990, 2005) ). 因此赖特等人的结果。不能将其视为听觉系统天生偏爱西方音调的证据。对于那些有兴趣解决这个问题的人,非常希望复制 Wright 等人的研究。与声音环境受到更严格控制的猴子(或其他动物)。
Wright et al. also conducted another experiment that has attracted a good deal of attention. The tested the monkeys for octave equivalence using novel tonal versus atonal melodies, and found that the monkeys did not show octave equivalence when atonal melodies were used.18 To some, this might suggest that tonality is innately favored by the auditory system. However, close inspection of their data shows that the situation is not so clear cut. Octaves were treated as more similar than 1/2 octaves for both tonal and atonal melodies. Furthermore, in simple same-different discrimination trials in which melodies were not transposed, the monkeys showed superior discrimination for tonal melodies. These findings converge to suggest that the monkeys found atonal melodies harder to remember than tonal ones, which may in turn have been due to incidental exposure to music in captivity. Hauser and McDermott (2003) have noted that monkey facilities often have a TV, and in their own research they have shown that monkeys are adept at statistical learning, which is currently thought to be an important mechanism for tonality learning (Krumhansl, 1990, 2005). Thus the results of Wright et al. cannot be taken as evidence that Western tonality is somehow innately favored by the auditory system. For those interested in addressing this issue, it would be highly desirable to replicate the study of Wright et al. with monkeys (or other animals) whose acoustic environment was more tightly controlled.
赖特等人提出的一个明显问题。研究是猴子是否会像人类婴儿一样,在基于纯五度的音高模式上表现出处理优势。如果是这种情况(尤其是对于在声学控制环境中长大的猴子),这将提供证据表明五度音在人类音乐中的广泛使用是基于基本的听觉处理机制,而不是音乐选择的产物。
An obvious question that arises from the Wright et al. study is whether monkeys would show a processing advantage for pitch patterns based on the perfect fifth, as young human infants do. If this is the case (especially for monkeys raised in acoustically controlled environments), this would provide evidence that the widespread use of the fifth in human music is based on basic auditory processing mechanisms rather than being a product of selection for music.
Wright 等人在他们的研究中使用调性作为一个变量,这为人们可以对动物进行的许多研究指明了方向,这些研究涉及动物学习调性序列与无调性序列的难易程度。例如,可以训练一组动物来区分音调序列,而训练另一组动物来区分具有相似格式塔属性(例如,轮廓模式和整体音程分布)的无调序列。可以研究标准的试验次数(音调序列更容易学习吗?)其他类型的序列。冒着过于频繁地重复自己的风险,我必须再次强调,只有当动物的听觉历史从它们第一次开始听到(即出生前)的时间开始受到控制时,此类研究的结果才有意义,以便控制差异暴露的影响。例如,可以让动物听到等量的有调旋律和无调调旋律。19在进行此类研究时,将范围扩大到西方音乐之外以询问其他文化的音高系统将具有重大的理论意义。例如,遵循爪哇语惯例的音阶是否比爪哇人认为“不合语法”的音阶更容易学习?
Wright et al.’s use of tonality as a variable in their research points the way to many studies one could do with animals with regard to the ease with which they learn tonal versus atonal sequences. For example, one set of animals could be trained to discriminate tonal sequences, whereas another set is trained to discriminate atonal sequences that have similar Gestalt properties (e.g., contour patterns and overall interval distributions). Number of trials to criterion could be studied (are tonal sequences easier to learn?), as could generalization to the other type of sequence. At the risk of repeating myself once too often, I must reemphasize that the results of such studies will only be meaningful if the acoustic history of the animals is controlled from the time they first begin to hear (i.e., before birth), in order to control for effects of differential exposure. For example, animals could be raised hearing equal amounts of tonal versus atonal melodies.19 In conducting such studies, it would be of great theoretical interest to expand the scope beyond Western music to ask about the pitch systems of other cultures. For example, are scales that follow Javanese conventions easier to learn than scales that the Javanese would regard as “ungrammatical”?
让我们退后一步,评估一下。在引言中,解释了为什么人类音乐的普遍性和大脑对音乐某些方面的专门化没有提供任何证据表明音乐能力是自然选择的直接目标。在第 7.3.1 节中,我们看到现有的关于音乐的适应主义猜想并不是很有说服力,第7.3.2 - 7.3.5节表明来自发育、遗传学和动物研究的数据(目前)没有提供令人信服的理由来拒绝零假设音乐不是自然选择的直接目标。因此,根据目前的证据,音乐似乎不是一种生物适应。
Let us step back and take stock. In the introduction, it was explained why the universality of human music and brain specialization for certain aspects of music do not provide any evidence that musical abilities have been the direct target of natural selection. In section 7.3.1, we saw that existing adaptationist conjectures about music are not very persuasive, and sections 7.3.2-7.3.5 showed that data from development, genetics, and animal studies provide (as yet) no compelling reasons to reject the null hypothesis that music has not been a direct target of natural selection. Thus based on current evidence, music does not seem to be a biological adaptation.
我赶紧补充说,这个问题还没有解决,还有一个探索很少的领域可能对未来的音乐进化研究很重要,即基于节拍的节奏感知和同步的发展研究。该区域是下一节的重点。然而,为了争论起见,让我们说,这个(或任何其他)领域的研究并没有提供证据表明人类的思维是专门为音乐认知而塑造的。这是否迫使我们得出结论,音乐只是一种装饰,一种使我们的感官发痒并且很容易被抛弃的享乐消遣(Pinker,1997)?
I hasten to add that the question is not yet settled, and that there is a little-explored area that will likely prove important for future research on the evolution of music, namely developmental studies of beat-based rhythm perception and synchronization. This area is the focus of the next section. Let us say for the sake of argument, however, that research in this (or any other) area does not provide evidence that human minds have been specifically shaped for music cognition. Does this force us to conclude that music is merely a frill, a hedonic diversion that tickles our senses and that could easily be dispensed with (Pinker, 1997)?
一点也不。我想建议改编和装饰之间的选择是错误的二分法,音乐属于不同的类别。就其发明事物以改变自身存在的能力而言,智人在所有生物体中是独一无二的。书面语言就是一个很好的例子:这项技术可以跨越时空分享复杂的思想,并以超越任何单一人类思维极限的方式积累知识。飞机的发明是另一个例子:这些机器从根本上改变了普通人体验世界的方式,允许不同文化之间的大规模流动。最后一个例子是现代互联网,一种正在改变人们交流、学习和建立社区方式的技术。这些都是人类发明的技术的例子,这些技术已经密切融入我们的生活结构,改变了个人和群体的生活。这种永无止境的发明、整合和改造循环是人类独有的(Clark,2003 年),并且有着古老的根源。我相信音乐可以在这个框架中被明智地思考,换句话说,音乐是我们发明的改变人类生活的东西。就像其他变革性技术一样,一旦发明和体验,就几乎不可能放弃。
Not at all. I would like to suggest that the choice between adaptation and frill is a false dichotomy, and that music belongs in a different category. Homo sapiens is unique among all living organisms in terms of its ability to invent things that transform its own existence. Written language is a good example: This technology makes it possible to share complex thoughts across space and time and to accumulate knowledge in a way that transcends the limits of any single human mind. The invention of aircraft is another example: These machines have fundamentally changed the way ordinary humans experience their world, allowing large-scale movement between cultures. A final example is the modern Internet, a technology that is changing the way people communicate, learn, and make communities. These are all examples of technologies invented by humans that have become intimately integrated into the fabric of our life, transforming the lives of individuals and groups. This never-ending cycle of invention, integration, and transformation is uniquely human (Clark, 2003), and has ancient roots. I believe music can be sensibly thought of in this framework, in other words, as something that we invented that transforms human life. Just as with other transformative technologies, once invented and experienced, it becomes virtually impossible to give it up.
这种将音乐作为一种变革性技术的概念有助于解释为什么音乐在人类文化中具有普遍性。音乐具有普遍性,因为它为人类所做的一切受到普遍重视。在这方面,音乐就像火的制造和控制。火的控制在人类文化中是普遍存在的,因为它以我们深切重视的方式改变了我们的生活,例如,让我们能够烹饪食物、取暖和在黑暗的地方看东西。一旦一种文化学会了生火,就没有回头路可走,即使我们可能没有这种能力也能生存。同样,音乐具有普遍性,因为它以我们深切重视的方式改变我们的生活,例如,在情感和审美体验以及身份形成方面。目前的考古证据表明,音乐在很长一段时间内就具有这种变革性的力量:无可争议的最古老乐器是来自德国的骨笛,距今大约 36,000 年(Richter 等人,2000 年)。似乎可以找到更早的乐器,因为现在已知人类艺术至少有 100,000 年的历史(在以色列和阿尔及利亚发现的贝壳珠的时代(Vanhaeren 等人,2006 年;参见 Henshilwood) et al., 2004).(知道最古老的乐器是否也会在非洲被发现会特别有趣,因为现代人类被认为是在大约 120,000 年前在那片大陆上进化的,
This notion of music as a transformative technology helps to explain why music is universal in human culture. Music is universal because what it does for humans is universally valued. Music is like the making and control of fire in this respect. The control of fire is universal in human culture because it transforms our lives in ways we value deeply, for example, allowing us to cook food, keep warm, and see in dark places. Once a culture learns fire making, there is no going back, even though we might be able to live without this ability. Similarly, music is universal because it transforms our lives in ways we value deeply, for example, in terms of emotional and aesthetic experience and identity formation. Current archeological evidence suggests music has had this transformative power for a very long time: The oldest undisputed musical instrument is an approximately 36,000-year-old bone flute from Germany (Richter et al., 2000). It seems likely that even earlier instruments will be found, because human art is now known to have at least a 100,000-year-old history (the age of shell beads found in Israel and Algeria (Vanhaeren et al., 2006; cf. Henshilwood et al., 2004). (It will be particularly interesting to know if the most ancient musical instruments will also be found in Africa, because modern humans are thought to have evolved on that continent about 120,000 years ago, and to have emigrated from there about 45,000 years ago.)
当然,有一个非常重要的意义,音乐不同于其他技术,如生火或互联网。音乐有能力改变我们大脑的结构,由于运动或知觉体验而扩大某些区域(Elbert 等人,1995 年;Pantev 等人,1998 年,2001 年;Münte 等人,2002 年;Pascual-Leone, 2003),或引导某些领域专门研究音乐特定知识(例如,旋律的记忆,脑损伤后的选择性失忆症证明;Peretz,1996)。然而,音乐在这方面并不是独一无二的。大脑具有因经验而改变的非凡能力,对动物和人类大脑的大量研究表明,运动或感知经验可以改变特定大脑区域的相对大小和组织(Buonomano & Merzenich,1998 年;Huttenlocher,2002 年)。此外,用于阅读书面正字法的某些人类大脑区域的专门化表明,学习可以导致发育过程中的神经专门化。因此,人类的发明、内化和转化过程可以改变使这一过程成为可能的器官(Clark,2003)。
Of course, there is one very important sense in which music is unlike other technologies such as fire making or the Internet. Music has the power to change the very structure of our brains, enlarging certain areas due to motor or perceptual experience (Elbert et al., 1995; Pantev et al., 1998, 2001; Münte et al., 2002; Pascual-Leone, 2003), or leading certain areas to specialize in music-specific knowledge (e.g., memories for melodies, as evidenced by selective amusias following brain damage; Peretz, 1996). However, music is not unique in this respect. The brain has a remarkable capacity to change due to experience, and numerous studies of animal and human brains have shown that motor or perceptual experience can change the relative size and organization of specific brain areas (Buonomano & Merzenich, 1998; Huttenlocher, 2002). Furthermore, the specialization of certain human brain regions for reading written orthography demonstrates that learning can lead to neural specialization during development. Thus the human process of invention, internalization, and transformation can change the very organ that makes this process possible (Clark, 2003).
到目前为止,在本书中,每一章的最后一节都概述了关于音乐与语言关系的一个有前途的研究领域。当前章节不同。到目前为止,一直有人认为,(目前)没有证据严重挑战音乐能力没有直接自然选择的零假设。为了找到这样的证据,有必要证明音乐认知的一个基本方面不是认知机制的副产品,该认知机制也服务于其他更明显的适应性领域(例如,听觉场景分析或语言)。因此,本节寻求与语言处理无关的音乐认知方面,以完善音乐的进化研究。
Thus far in this book, the final section of each chapter has outlined a promising area of research with regard to music-language relations. The current chapter is different. So far, it has been argued that there is (as yet) no evidence that seriously challenges the null hypothesis of no direct natural selection for musical abilities. To find such evidence, it will be necessary to demonstrate that there is a fundamental aspect of music cognition that is not a byproduct of cognitive mechanisms that also serve other, more clearly adaptive, domains (e.g., auditory scene analysis or language). Thus this section seeks an aspect of music cognition that is not related to language processing, to refine the evolutionary study of music.
缺乏与语言(或其他结构化认知领域)的关系将建立所讨论的音乐认知方面的领域特异性,这是证明能力已被音乐自然选择塑造的证据的一个重要标准。然而,这不是唯一的标准,因为领域特异性可能是发展的产物(回想一下7.1 节中的讨论)专门用于阅读书面文本的大脑区域)。为了具有进化相关性,音乐认知的这一方面应该以一种表明大脑专门准备获得这种能力的方式发展。也就是说,它应该早熟和自发地发展(例如,与学习阅读相反,这是一个费力且缓慢的过程)。最后,重要的是要表明所讨论的方面是人类独有的,其他动物无法获得。最后一个标准是基于非人类动物不会自然地创作音乐的断言(参见第 7.1 节)和麦克德莫特和豪瑟,2005 年)。如果非人类动物可以获得音乐认知的一个基本方面,那么这个方面就潜伏在动物大脑中,不需要自然选择音乐来解释它的存在。
A lack of relationship to language (or other structured cognitive domains) would establish the domain-specificity of the aspect of music cognition in question, which is one important criterion for evidence that the ability has been shaped by natural selection for music. It is not the only criterion, however, because domain-specificity can be a product of development (recall the discussion in section 7.1 of brain areas specialized for reading written text). To be of evolutionary relevance, this aspect of music cognition should develop in a manner that suggests the brain is specifically prepared to acquire this ability. That is, it should develop precociously and spontaneously (e.g., in contrast to learning to read, which is an effortful and slow process). Finally, it would be important to show that the aspect in question is unique to humans, and cannot be acquired by other animals. This last criterion is posited on the assertion that nonhuman animals do not naturally make music (cf. section 7.1 and McDermott & Hauser, 2005). If nonhuman animals can acquire a fundamental aspect of music cognition, then this aspect is latent in animal brains and does not require natural selection for music to explain its existence.
音乐认知的任何方面是否有可能满足这三个标准(领域特异性、先天性和人类特异性)?Justus 和 Hutsler (2005) 以及 McDermott 和 Hauser (2005) 认为音乐音高处理不满足这些标准(参见第 7.3.3 节)-7.3.5 以上),但他们基本上没有探索音乐节奏的问题。我相信音乐节奏有一个广泛的方面在这方面值得关注,即基于节拍的节奏处理。在每一种文化中,都有某种形式的音乐具有规律的节拍,一种周期性的脉冲,可以在表演者之间提供时间协调,并引起听众的同步运动反应(McNeill,1995 年;Nettl,2000 年)。因此,我想提出以下问题:基于节拍的节奏处理是否反映了为制作音乐而对大脑进行的进化修改?
Is there any aspect of music cognition that has the potential to satisfy these three criteria (domain-specificity, innateness, and human-specificity)? Justus and Hutsler (2005) and McDermott and Hauser (2005) have argued that musical pitch processing does not satisfy these criteria (cf. section 7.3.3-7.3.5 above), but they leave the issue of musical rhythm largely unexplored. I believe there is one widespread aspect of musical rhythm that deserves attention in this regard, namely beat-based rhythm processing. In every culture, there is some form of music with a regular beat, a periodic pulse that affords temporal coordination between performers and elicits a synchronized motor response from listeners (McNeill, 1995; Nettl, 2000). Thus I would like to raise the following question: Might beat-based rhythm processing reflect evolutionary modifications to the brain for the purpose of music making?
在探讨这个问题时,我们不会关心适应问题。如果我们被自然选择特别塑造为基于节拍的处理,那么大概是能够感知音乐的早期人类击败(并同步他们的行动)比那些没有这种能力的人有一些选择性优势。已经提出了关于节拍感知和进化同步的适应性价值的假设(例如,Merker,2000 年),但这些将不是这里的重点,因为音乐的史前史可能永远无法确定(如第 7.3 节所述) ). 相反,以下部分将重点放在发展、领域特异性和人类特异性问题上,因为对这些问题的研究有可能用经验数据解决进化问题。正如我们将看到的,这些领域的问题多于答案,有许多开放的调查途径(参见 Patel,2006c)。
In pursuing this question, we shall not concern ourselves with the issue of adaptation. If we have been specifically shaped by natural selection for beat-based processing, then presumably early humans who were able to perceive a musical beat (and synchronize their actions to it) had some selective advantage over those who did not have this ability. Hypotheses have been offered about the adaptive value of beat perception and synchronization in evolution (e.g., Merker, 2000), but these will not be the focus here because the prehistory of music will probably never be known with any certainty (as noted in section 7.3). Instead, the following sections focus on issues of development, domain-specificity, and human-specificity, because research on these issues has the potential to address evolutionary questions with empirical data. As we shall see, these are areas in which there are more questions than answers, with many open avenues for investigation (cf. Patel, 2006c).
关于节拍节奏的一个重要问题是它与语音节奏的关系,因为音乐和语言都具有结构丰富的节奏模式。尽管早期的语音节奏理论提出了基于重音或音节的潜在同步脉冲,但经验数据并不支持这一观点,当代语音节奏研究在很大程度上放弃了同步问题(参见第 3 章)). 然而,通过节拍的概念,音乐节拍确实与语音节奏有更抽象的联系。音乐节拍通常出现在节拍的上下文中,节拍是节拍的分层组织,其中一些节拍被认为比其他节拍强。有趣的是,语音也有一个基于重音或突出的“韵律”等级(Selkirk,1984;Terken & Hermes,2000),这表明根据等级突出模式组织节奏序列的趋势可能起源于语言。然而,至关重要的是,语音的“节拍”(重读音节)并没有标出规律的脉搏。这种差异具有重要的认知后果。特别是,在音乐中使用感知同步脉冲涉及周期性时间预期,这在音乐认知中起着基本作用(Jones 1976;Jones & Boltz,1989),但这似乎在普通语音感知中几乎没有作用(参见 Pitt & Samuel,1990)。人类能够从复杂的听觉刺激中提取周期性,并且可以将他们的期望集中在音乐中不同层次级别的周期性上(Drake、Jones 和 Baruch,2000)。这些周期性预期是听众运动与节拍同步的基础,正如听众通常轻拍或轻微移动的事实所示在实际节拍之前,表明同步是基于结构化的时间预期(参见 Patel、Iversen 等人,2005 年)。
One important question about beat-based rhythm is its relationship to speech rhythm, because both music and language have richly structured rhythmic patterns. Although early theories of speech rhythm proposed an underlying isochronous pulse based on stresses or syllables, empirical data have not supported this idea, and contemporary studies of speech rhythm have largely abandoned the isochrony issue (cf. Chapter 3). However, a musical beat does have a more abstract connection to speech rhythm via the notion of meter. A musical beat typically occurs in the context of a meter, a hierarchical organization of beats in which some beats are perceived as stronger than others. Interestingly, speech also has a “metrical” hierarchy based on stress or prominence (Selkirk, 1984; Terken & Hermes, 2000), suggesting that a tendency to organize rhythmic sequences in terms of hierarchical prominence patterns may originate in language. Crucially, however, the “beats” of speech (stressed syllables) do not mark out a regular pulse. This difference has important cognitive consequences. In particular, the use of a perceptually isochronous pulse in music engages periodic temporal expectancies that play a basic role in music cognition (Jones 1976; Jones & Boltz, 1989), but that appear to play little or no role in ordinary speech perception (cf. Pitt & Samuel, 1990). Humans are able to extract periodicities from complex auditory stimuli, and can focus their expectancies on periodicities at different hierarchical levels in music (Drake, Jones, & Baruch, 2000). These periodic expectancies are the basis of motor synchronization to the beat on the part of listeners, as shown by the fact that listeners typically tap or move slightly ahead of the actual beat, indicating that synchronization is based on structured temporal anticipation (cf. Patel, Iversen, et al., 2005).
从语言转向其他认知能力,人们可能会明智地问,基于节拍的处理的周期性预期是否只是大脑进行一般时间预期的能力的副产品,换句话说,就是衡量感兴趣的事件何时会发生。根据最近发生的事件的时间发生。人脑擅长一般的时间预期。例如,每次我们接球或走在拥挤的人行道上时,我们从事这种时间预期(例如,为了将手移动到正确的位置以接住球,或者在撞击之前走出迎面而来的行人的道路)。然而,至关重要的是,这些是基于预测单个事件的“弹道式”时间预期,而不是基于重复时间间隔的心理模型的周期性预期。非正式观察表明,一些在日常生活中表现良好的人在基于音乐节拍的处理方面存在严重困难,这表明构建周期性预期的能力并不是构建一般时间预期的能力的微不足道的副产品。
Turning from language to other cognitive abilities, one might sensibly ask if the periodic expectancies of beat-based processing are simply a byproduct of the brain’s ability to do generic temporal anticipation, in other words, to gauge when exactly an event of interest is going to occur based on the timing of events in the immediate past. Human brains are adept at generic temporal anticipation. For example, each time we catch a ball or walk down a crowded sidewalk, we engage in this kind of temporal anticipation (e.g., in order to move a hand to the right position to catch the ball, or to step out of the way of an oncoming pedestrian before impact). Crucially, however, these are “ballistic” temporal expectancies based on anticipating a single event, rather than periodic expectancies based on a mental model of recurring time intervals. Informal observation suggests that some people who function perfectly well in everyday life have serious difficulties in musical beat-based processing, which would suggest that the ability to construct periodic expectancies is not a trivial byproduct of the ability to construct generic temporal expectancies.
基于节拍的处理似乎也不同于更通用的测量时间间隔持续时间的能力,后者在动物中很普遍。例如,可以训练兔子学习警告音和眼睛吹气之间的短时间间隔的持续时间。学习后,一旦他们听到了音调,他们就会预测随后的抽吸会在何时及时发生,这表明他们的大脑可以进行结构化的时间预测。神经研究表明,这种间隔计时会招募一个分布式神经网络,包括基底神经节、皮质和丘脑 (Matell & Meck, 2000),这种网络在脊椎动物中可能非常相似。该网络的一个重要方面是它的非模态性质:它同样擅长学习听觉事件、视觉事件等之间的间隔。这与音乐中基于节拍的节奏处理形成鲜明对比,后者似乎与听觉系统有着特殊的关系,人们难以在视觉节奏序列中感知节拍这一事实证明了这一点(Patel、Iversen 等人,2008 年)。 , 2005). 因此,即使基于节拍的处理起源于用于间隔计时的大脑回路,这些回路似乎已在人类中进行了修改,以创建特殊的听觉-运动联系。这种改变可能是为什么除了人类之外没有其他动物被报道与音乐节拍同步移动的原因之一(一个主题在 人们难以感知视觉节奏序列中的节拍这一事实证明了这一点(Patel、Iversen 等人,2005 年)。因此,即使基于节拍的处理起源于用于间隔计时的大脑回路,这些回路似乎已在人类中进行了修改,以创建特殊的听觉-运动联系。这种改变可能是为什么除了人类之外没有其他动物被报道与音乐节拍同步移动的原因之一(一个主题在 人们难以感知视觉节奏序列中的节拍这一事实证明了这一点(Patel、Iversen 等人,2005 年)。因此,即使基于节拍的处理起源于用于间隔计时的大脑回路,这些回路似乎已在人类中进行了修改,以创建特殊的听觉-运动联系。这种改变可能是为什么除了人类之外没有其他动物被报道与音乐节拍同步移动的原因之一(一个主题在第 7.5.3 节。以下)。
Beat-based processing also appears to be distinct from the more generic ability to gauge the duration of time intervals, which is widespread among animals. Rabbits, for example, can be trained to learn the duration of a short time interval between a warning tone and a puff of air on the eye. After learning, once they hear the tone, they anticipate when in time the subsequent puff will occur, showing that their brain can do structured temporal anticipation. Neural research indicates that this sort of interval timing recruits a distributed neural network including the basal ganglia, cortex, and thalamus (Matell & Meck, 2000), a network that is probably quite similar across vertebrates. An important aspect of this network is its amodal nature: It is equally good at learning intervals between auditory events, visual events, and so forth. This stands in sharp contrast to beat-based rhythmic processing in music, which appears to have a special relationship to the auditory system, as evidenced by the fact that people have difficulty perceiving a beat in visual rhythmic sequences (Patel, Iversen, et al., 2005). Thus even if beat-based processing has its roots in brain circuits for interval timing, these circuits appear to have been modified in humans to create a special auditory-motor link. Such a modification may be one reason why no other animals besides humans have been reported to move in synchrony with a musical beat (a topic taken up at greater length in section 7.5.3. below).
从神经心理学的角度来看,人们对基于节拍的节奏处理的领域特异性知之甚少。这里的一个关键问题是,破坏它的脑损伤是否也会破坏其他基本的非音乐认知能力。如果是这样,这可能表明基于节拍的处理是基于从其他大脑功能中招募的能力。神经心理学文献包含对脑损伤后出现音乐节律障碍或“获得性心律失常”的个体的描述(参见第 3 章,第 3.5.3 节)). 该文献的两个显着发现是节奏能力可以有选择地被破坏,而音调处理技能相对完整,并且需要简单区分时间模式的节奏任务与需要评估或产生周期模式的节奏任务之间存在分离。然而,迄今为止,还没有神经心理学研究检验过基于节拍的处理缺陷与其他基本认知技能之间的关系。音乐节奏技能和语言能力之间存在联系的一个有趣线索来自一项对一个家庭的研究,该家庭的某些成员继承了言语和语言缺陷(KE 家族,在上文第7.2.9和7.3.4节中讨论)。具有这种缺陷的个体——这会影响语言句法和言语发音运动——也难以辨别和再现短的音乐节奏模式,这与音乐音高模式的相对能力形成对比 (Alcock 等人,2000b)。不幸的是,Alcock 等人并未专门研究基于节拍的处理,从而留下了一个悬而未决的问题,即 KE 家族中出现的疾病与周期性节律模式的认知处理有何关系。
From a neuropsychological standpoint, little is known about the domain-specificity of beat-based rhythmic processing. A key question here is whether brain damage that disrupts it also disrupts other basic nonmusical cognitive abilities. If so, this might suggest that beat-based processing is based on abilities recruited from other brain functions. The neuropsychological literature contains descriptions of individuals with musical rhythmic disturbance after brain damage, or “acquired arrhythmia” (cf. Chapter 3, section 3.5.3). Two notable findings from this literature are that rhythmic abilities can be selectively disrupted, leaving pitch processing skills relatively intact, and that there are dissociations between rhythmic tasks requiring simple discrimination of temporal patterns and those requiring the evaluation or production of periodic patterns. However, no neuropsychological studies to date have examined relations between deficits in beat-based processing and in other basic cognitive skills. One intriguing hint of a link between musical rhythm skills and language abilities comes from a study of a family in which some members have an inherited speech and language deficit (the KE family, discussed in sections 7.2.9 and 7.3.4 above). Individuals with this deficit—which influences both linguistic syntax and speech articulatory movements—also have difficulty discriminating and reproducing short musical rhythm patterns, in contrast with a relative facility with musical pitch patterns (Alcock et al., 2000b). Unfortunately, beat-based processing was not specifically investigated by Alcock et al., leaving open the question of how the disorder seen in the KE family relates to cognitive processing of periodic rhythmic patterns.
解决基于节拍的节奏处理的先天性的一种方法是通过发展研究,以探索大脑是否似乎特别准备获得这种能力。具体来说,了解人类在感知音乐节拍时是否表现出早熟的能力是很有趣的,就像他们在语言感知方面表现出早熟的能力一样。如第 7.2.4 节所述,到 1 岁时,婴儿表现出令人印象深刻的言语感知能力。是否有证据表明 12 个月以下的婴儿可以感知音乐中的节拍,换句话说,围绕推断的脉搏构建周期性时间预期的心理框架?最明确的证据是婴儿可以将他们的动作与音乐节拍同步。事实上,年幼的婴儿不会将他们的动作与音乐节拍同步(Longhi,2003)。在西欧文化中,与节拍同步的能力似乎要到 4 岁左右才会出现(Drake、Jones 和 Baruch,2000 年;Eerola 等人,2006 年)。这是惊人的,因为似乎有足够的机会在生命早期学习同步技能:大多数儿童歌曲都有一个非常明显的节拍,这在婴儿导向的歌唱中得到了强调(Trainor 等人,1997 年),幼儿园曲调对韵律结构有很强的分布线索(即,强节拍上更频繁的事件;Palmer & Pfordresher,2003 年) ; 参见 Palmer & Krumhansl, 1990),并且婴儿经常随着音乐摇摆或弹跳 (Papousek, 1996)。然而,尽管有这些事实,尽管随着节拍的运动只需要相对粗大的运动技能(例如,拍手、上下摆动或左右摇摆),但随着音乐的同步运动似乎在发展过程中出现得相对较慢。婴儿经常随着音乐摇摆或弹跳(Papousek,1996)。然而,尽管有这些事实,尽管随着节拍的运动只需要相对粗大的运动技能(例如,拍手、上下摆动或左右摇摆),但随着音乐的同步运动似乎在发展过程中出现得相对较慢。婴儿经常随着音乐摇摆或弹跳(Papousek,1996)。然而,尽管有这些事实,尽管随着节拍的运动只需要相对粗大的运动技能(例如,拍手、上下摆动或左右摇摆),但随着音乐的同步运动似乎在发展过程中出现得相对较慢。20
One way to address the innateness of beat-based rhythm processing is via developmental studies, in order to explore whether the brain seems specifically prepared to acquire this ability. Specifically, it is of interest to know whether humans show precocious abilities when it comes to perceiving a musical beat, just as they show precocious abilities in language perception. As discussed in section 7.2.4, by 1 year of age, infants show impressive speech perception abilities. Is there evidence that infants younger than 12 months can perceive a beat in music, in other words, construct a mental framework of periodic temporal expectancies around an inferred pulse? The clearest evidence for this would be a demonstration that infants can synchronize their movements to the beat of music. In fact, young infants do not synchronize their movements to a musical beat (Longhi, 2003). In Western European cultures, the ability to synchronize with a beat does not appear to emerge till around age 4 (Drake, Jones, & Baruch, 2000; Eerola et al., 2006). This is striking given that there seems to be ample opportunity to learn synchronization skills early in life: Most children’s songs have a very clearly marked beat that is emphasized in infant-directed singing (Trainor et al., 1997), nursery tunes have strong distributional cues to metrical structure (i.e., more frequent events on strong beats; Palmer & Pfordresher, 2003; cf. Palmer & Krumhansl, 1990), and infants are frequently rocked or bounced to music (Papousek, 1996). Yet despite these facts, and despite the fact that movement to a beat requires only relatively gross motor skills (e.g., clapping, bobbing up and down, or swaying side to side), synchronized movement to music appears to emerge relatively slowly in development.20
然而,节拍感知技能可能先于运动同步能力,就像儿童对语言的理解通常先于他们自己的语言产生技能一样。感知研究可以帮助确定人脑是否特别准备好根据音乐刺激构建周期性时间预期。
However, it is possible that beat perception skills precede motor synchronization abilities, just as children’s understanding of speech is typically in advance of their own speech production skills. Perceptual studies can help determine whether the human brain is specifically prepared to construct periodic temporal expectancies from musical stimuli.
有充分的证据表明婴儿对听觉序列的基本节奏方面很敏感。例如,2 至 4 个月大的婴儿可以检测持续时间间隔模式的变化,例如,从短-长到长-短间隔模式的变化([xx---x] 与 [ x---xx],其中每个 x 表示一个音调开始,每个破折号对应大约 100 毫秒的静音;Demany 等人,1977)。在 7-9 个月大时,即使节奏发生变化,婴儿也能察觉到时间模式的变化。例如,当接受训练以检测一种节奏下从短-长到长-短间隔模式的变化,然后用其他节奏的刺激进行测试时,它们仍然对模式变化有反应(Trehub & Thorpe, 1989). 这表明婴儿以节奏模式感知分组结构,可能基于连续事件之间持续时间比率的模式。具有分组感知能力的能力在 12 个月大的婴儿中也很明显,他们可以区分具有相同持续时间和组数但每组事件数量不同的节律模式 (Morrongiello, 1984)。因此,婴儿似乎具有对节奏分组的感知的早熟能力,这可能源于分组在语音感知中的重要性(参见第 3 章,第 3.2.3 节)。
There is good evidence that infants are sensitive to basic rhythmic aspects of auditory sequences. For example, 2- to 4-month-olds can detect a change in the pattern of duration intervals, for example, a change from a short-long to a long-short pattern of intervals ([x-x---x] versus [x---x-x], in which each x marks a tone onset and each dash corresponds to approximately 100 ms of silence; Demany et al., 1977). At 7-9 months, infants can detect a change in the temporal pattern even when tempo varies. For example, when trained to detect a change from a short-long to a long-short pattern of intervals at one tempo, and then tested with stimuli at other tempi, they still show a response to the change in pattern (Trehub & Thorpe, 1989). This suggests that infants perceive grouping structure in rhythmic patterns, likely based on the patterning of duration ratios between successive events. A facility with grouping perception is also evident in 12-month-olds, who can discriminate rhythmic patterns that have the same duration and number of groups, but different numbers of events in each group (Morrongiello, 1984). Thus it appears that infants have a precocious facility with the perception of rhythmic grouping, which could stem from the importance of grouping in speech perception (cf. Chapter 3, section 3.2.3).
是否有证据表明婴儿也善于感知音乐中的节拍?众所周知,当事件具有大约 600 毫秒的 IOI 时,2 至 4 个月大的婴儿可以检测到同步序列中节奏的变化(15%)(Baruch & Drake,1997),到 6 个月大时, 婴儿可以检测到等时序列中事件持续时间的 10% 变化 (Morrongiello & Trehub, 1987)。然而,前一个发现可以通过对随时间变化的事件平均发生率的敏感性来解释,而后者可以通过对音调的绝对持续时间的敏感性来解释(参见 Morrongiello,1984)。因此,需要对基于节拍的节奏模式的敏感性进行特定测试。尽管有许多关于婴儿期节奏处理的优雅研究似乎表明基于节拍的处理(例如,Hannon & Trehub, 2005;Hannon & 约翰逊,2005 年;Phillips-Silver & Trainor, 2005),仔细检查表明谨慎是必要的(有兴趣的读者可以查阅章节附录进行详细讨论)。在这里,我重点关注一项研究,其方法似乎对未来婴儿周期性时间预期的研究很有前景。
Is there evidence that infants are also adept at the perception of a beat in music? It is known that 2- to 4-month-old infants can detect a change in tempo (of 15%) in an isochronous sequence, when events have an approximately 600 ms IOI (Baruch & Drake, 1997), and by 6 months old, infants can detect a 10% change in durations of events in an isochronous sequence (Morrongiello & Trehub, 1987). However, the former finding could be explained via a sensitivity to the average rate of events over time, and the latter via sensitivity to the absolute durations of tones (cf. Morrongiello, 1984). Thus specific tests of sensitivity to beat-based rhythmic patterns are needed. Although there are a number of elegant studies of rhythm processing in infancy that might seem to indicate beat-based processing (e.g., Hannon & Trehub, 2005; Hannon & Johnson, 2005; Phillips-Silver & Trainor, 2005), closer examination suggests that caution is warranted (the interested reader may consult the chapter appendix for a detailed discussion). Here I focus on one study whose methodology seems promising for future work on periodic temporal expectancies in infants.
Bergeson 和 Trehub (2006) 检查了 9 个月大的婴儿检测短音调节奏序列中细微时间变化的能力,其中所有音调都具有相同的音高和强度。虽然这些音调没有物理重音,但根据 Povel 和 Okkerman(1981)的研究,由于事件在组中的位置,一些音调具有“主观重音”(图 7.3)。在其中三种模式中,主观口音与常规节拍一致(图 7.3a中显示了一种这样的模式)。
Bergeson and Trehub (2006) examined the ability of 9-month-old infants to detect a subtle temporal change in short rhythmic sequences of tones, in which all tones had the same pitch and intensity. Although the tones had no physical accents, according to the research of Povel and Okkerman (1981), some of the tones had “subjective accents” due to the position of events in groups (Figure 7.3). In three of these patterns, the subjective accents were consistent with a regular beat (one such pattern is shown in Figure 7.3a).
图 7.3 Bergeson 和 Trehub (2006) 用来研究婴儿对节拍的感知的两种节奏模式。带有主观口音的音调标有“>”。请注意在模式 (A) 中,重音如何与周期为 600 毫秒的同步节拍对齐(由节奏模式下方的垂直线标记)。在模式 (B) 中,重音音调与常规节拍不一致。主观重音根据 Povel 和 Okkerman (1981) 的规则分配,换句话说,对于具有三个或更多音调的组,在第一个和最后一个音调上,在两个音调组的第二个音调上,以及在孤立的音调上。
Figure 7.3 Two rhythm patterns used by Bergeson and Trehub (2006) to study infant perception of a beat. Tones bearing subjective accents are marked with a “>.” Note how in pattern (A), accented tones tend to align with an isochronous beat with a period of 600 ms (marked by vertical lines below the rhythm pattern). In pattern (B), the accented tones do not align well with a regular beat. Subjective accents are assigned according to the rules of Povel and Okkerman (1981), in other words, on the first and last tones for groups with three or more tones, on the second tone for groups of two tones, and on isolated tones.
然而,在第四个序列中,重音与常规节拍不一致(图 7.3b)。(读者可以通过听声音示例 7.2a 和 b 来检查这一点,它们对应于图 7.3a中的两个节奏和 b。在声音示例 7.2a 中敲击规则节拍相对容易,但在声音示例 7.2b 中更难。)对成年人的研究表明,具有规则主观重音的节奏序列比具有不规则重音的节奏序列更容易学习和再现,大概是因为它们有助于根据时间上可预测的节拍来感知模式(Povel & Essens,1985 年;参见 Patel、Iversen 等人,2005 年)。Bergeson 和 Trehub 使用条件性转头程序奖励检测重复模式的变化,发现婴儿更容易检测到具有规则重音模式的单个音符的持续时间小幅减少。他们将此解释为婴儿提取了这些口音的规律性,并用它来诱导一个基于节拍的框架来感知时间模式。21需要做更多的工作来证明结果可以推广到其他有规律的主观口音和没有规律的主观口音的模式。然而,这项研究很重要,因为它引入了一种有用的方法来测试婴儿的节拍感知,以及这些方面的暗示性证据。如果结果得到支持,这将对音乐进化的原假设构成挑战。这是一个从生命早期开始感知节拍的大脑就暗示了由音乐自然选择形成的回路。
In a fourth sequence, however, the accents were not consistent with a regular beat (Figure 7.3b). (The reader may check this by listening to Sound Examples 7.2a and b, which correspond to the two rhythms in Figure 7.3a and b. It is relatively easy to tap a regular beat to Sound Example 7.2a, but harder for Sound Example 7.2b.) Research with adults has shown that rhythmic sequences with regular subjective accents are easier to learn and reproduce than those with irregular accents, presumably because they facilitate the perception of the pattern in terms of temporally predictable beats (Povel & Essens, 1985; cf. Patel, Iversen, et al., 2005). Using a conditioned head-turn procedure that rewarded detection of a change in a repeating pattern, Bergeson and Trehub found that the infants more easily detected a small duration decrement to a single note in patterns with regular accents. They interpret this to mean that the infants extracted the regularity of these accents and used it to induce a beat-based framework for perceiving the temporal pattern. The results are indeed consistent with this view, yet cannot be considered as definitive evidence for beat perception by infants, because they rely on just a single pattern with irregular accents.21 More work is needed to show that the results generalize to other patterns with versus without regular subjective accents. The study is important, however, in introducing a useful method for testing beat perception by infants, together with suggestive evidence along these lines. If the results are upheld, this would stand as a challenge to the null hypothesis for the evolution of music. That is, a brain wired for beat perception from early in life would be suggestive of circuits shaped by natural selection for music.
上一节侧重于对音乐节拍的感知,而本节侧重于节拍的同步运动。一个有趣的事实是,没有关于非人类动物自发地随着音乐节拍移动的报道。这自然会引出一个问题,即与节拍同步是否涉及由音乐能力的自然选择塑造的认知和神经机制。检验这个想法的一种方法是询问非人类动物(以下称为动物)是否能够学习与音乐节拍同步移动。如果是这样,这将表明音乐的自然选择没有必要解释这种能力,因为动物的神经系统并没有被音乐选择所塑造(参见第 7.1 节和 McDermott & Hauser,2005)。
Whereas the previous section focused on the perception of a musical beat, this section focuses on synchronized movement to a beat. It is an intriguing fact that that there are no reports of nonhuman animals spontaneously moving to the beat of music. This naturally leads to the question of whether synchronization to a beat involves cognitive and neural mechanisms that have been shaped by natural selection for musical abilities. One way to test this idea is to ask whether nonhuman animals (henceforth, animals) are capable of learning to move in synchrony with a musical beat. If so, this would show that natural selection for music is not necessary to account for this ability, because animal nervous systems have not been shaped by selection for music (cf. section 7.1 and McDermott & Hauser, 2005).
最近,已经表明至少一种动物(亚洲象,Elephas maximus)可以在没有人类持续时间提示的情况下学会在乐器上敲击稳定的节拍(Patel 和 Iversen,2006 年)。事实上,大象可以使用放在它的鼻子里的木槌敲击鼓,其节奏规律性甚至超过以相同节奏打鼓的人类(声音/视频示例 7.3)。22然而,所研究的大象(泰国大象管弦乐队的成员)没有证据表明在合奏环境中表演时它们的鼓声会与共同的节拍同步。
Recently, it has been shown that at least one species of animal (the Asian elephant, Elephas maximus) can learn to drum a steady beat on a musical instrument in the absence of ongoing timing cues from a human (Patel & Iversen, 2006). Indeed, using a mallet held in its trunk, an elephant can strike a drum with a rhythmic regularity that exceeds even humans drumming at the same tempo (Sound/Video Example 7.3).22 However, the elephants studied (members of the Thai Elephant Orchestra) showed no evidence of synchronizing their drumming to a common beat when performing in an ensemble setting.
当然,众所周知,动物可以在产生周期性信号时相互同步,例如,蟋蟀、青蛙或萤火虫在它们的求偶表演中有节奏地合唱(参见 Gerhardt & Huber,2002,第 8 章)。从表面上看,这似乎等同于人类与节拍的同步。对这些交流展示的仔细检查表明并非如此。例如,对蟋蟀和螽斯的研究表明,群体同步鸣叫很可能是雄性之间局部竞争性互动的附带现象,它们都试图先叫。雄性试图先于附近的其他雄性呼叫,因为雌性发现“领先的呼叫”有吸引力(Greenfield 等人,1997 年;Römer 等人,2002 年)。至关重要的是,雄性用来调整叫声节奏(为了先尝试叫声)的机制并不涉及将它们的叫声周期与周期模型匹配,而是一个周期一个周期的相位调整以响应其他男性。多男时所有人都使用这种策略,同步情节作为意外的副产品出现。换句话说,没有证据表明蟋蟀的节奏同步性是由结构化时间预期过程引起的。
Of course, it is well known that animals can synchronize with each other in producing periodic signals, for example, the rhythmic chorusing of crickets, frogs, or fireflies in their courtship displays (cf. Gerhardt & Huber, 2002, Ch. 8). Superficially, this may seem equivalent to human synchronization with a beat. A closer examination of these communicative displays suggests otherwise. For example, research on crickets and katydids has revealed that group synchrony in chirping is likely to be an epiphenomenon of local competitive interactions between males all trying to call first. Males attempt to call ahead of other nearby males because females find “leading calls” attractive (Greenfield et al., 1997; Römer et al., 2002). Crucially, the mechanism used by males to adjust the rhythm of their calls (in order to try to call first) does not involve matching their call period to a periodic model, but rather is a cycle-by-cycle phase adjustment in response to the calls of other males. When multiple males all use this strategy, episodes of synchrony emerge as an unintended byproduct. In other words, there is no evidence that rhythmic synchrony in crickets results from processes of structured temporal anticipation.
有趣的是,萤火虫之间的同步可能是另一回事,涉及改变有节奏的闪烁周期以与邻居同步(Buck,1988;Greenfield,2005)。这种有目的的同步更像是人类与节拍的同步,但萤火虫的闪烁在重要方面与人类运动与音乐不同。值得注意的是,人类可以在很宽的节奏范围内同步,可以与复杂的节奏刺激同步(即,可以移动到复杂声学模式的节拍,包括那些带有切分音的节拍),并显示跨模态同步,用听觉刺激驱动电机周期性行为的系统(不一定)旨在产生声音。萤火虫同步不表现出这些特征:萤火虫在非常有限的速度范围内同步,
Interestingly, synchrony among fireflies may be a different story, involving changes to the period of rhythmic flashing in order to come into synchrony with neighbors (Buck, 1988; Greenfield, 2005). This sort of purposeful synchrony is more like human synchronization to a beat, but firefly flashing nevertheless differs in important ways from human movement to music. Notably, humans can synchronize across a wide tempo range, can synchronize with complex rhythmic stimuli (i.e., can move to the beat of complex acoustic patterns, including those with syncopation), and show cross-modal synchronization, with an auditory stimulus driving the motor system in periodic behavior that is not (necessarily) aimed at sound production. Firefly synchrony does not exhibit this these features: Fireflies synchronize within a very limited tempo range, show no evidence of synchronizing to the beat of complex rhythmic light patterns, and produce a response that is in the same modality as the input signal.
这些差异反对这样一种观点,即人类与节拍的同步反映了生物系统中普遍存在的同步机制。相反,这种行为似乎反映了特定于音乐的能力。测试这个想法的一种方法是看看是否可以训练非人类动物与音乐节拍同步移动。在这方面,一个值得注意的事实是,尽管在心理学和神经科学领域进行了数十年的研究,动物已经被训练来完成复杂的任务,但没有一份报告表明动物被训练来敲击、啄食或与其他动物同步移动。听觉节拍 鉴于有许多研究表明动物具有准确的间隔计时神经机制(例如,Moore 等人,1998),这一点尤其令人惊讶。
These differences argue against the view that human synchronization to a beat reflects widespread mechanisms of synchronization in biological systems. Instead, it appears that this behavior may reflect music-specific abilities. One way to test this idea is to see if nonhuman animals can be trained to move in synchrony with a musical beat. In this regard, it is a remarkable fact that despite decades of research in psychology and neuroscience in which animals have been trained to do elaborate tasks, there is not a single report of an animal being trained to tap, peck, or move in synchrony with an auditory beat. This is particularly surprising given the fact that there are many studies showing that animals have accurate neural mechanisms for interval timing (e.g., Moore et al., 1998).
有人可能会反对说随着节拍移动对动物来说是一种不自然的行为,但这没有抓住要点。例如,为了研究感知或运动控制的神经机制(例如,用手持操纵杆控制的光标绘制图形;Schwartz 等人,2004 年),猴子经常被训练在神经科学实验中执行高度生态不自然的任务). 因此,相关的问题是动物是否可以学会随着节拍移动。如果是这样,那么(如前所述)这将表明音乐的自然选择没有必要解释这种能力。
One might object that moving to a beat is an unnatural behavior for an animal, but this misses the point. Monkeys, for example, are often trained to do highly ecologically unnatural tasks in neuroscience experiments for the purpose of research on neural mechanisms of perception or motor control (e.g., drawing figures with a cursor controlled by a handheld joystick; Schwartz et al., 2004). Thus the relevant question is whether an animal could learn to move to a beat. If so, then (as noted previously) this would indicate that natural selection for music is not necessary to account for this ability.
立即出现的一个问题是应该研究哪些动物。黑猩猩和倭黑猩猩似乎是显而易见的选择。在类人猿中,它们与人类的关系最为密切。它们也非常聪明,对 Kanzi 等受过语言训练的类人猿的研究证明了这一点(Savage-Rumbaugh 等人,1998 年)。此外,作为展示或游戏行为的一部分,黑猩猩和倭黑猩猩会用手或脚产生短时间的有节奏的“击鼓声”(Arcadi 等人,1998 年;Fitch,2006 年;Kugler 和 Savage-Rumbaugh,2002 年),这意味着他们可以在适合与节拍同步的时间尺度上自愿产生有节奏的运动。
A question that immediately arises is which animals one should study. Chimps and bonobos may seem the obvious choice. Among the great apes, they are the most closely related to humans. They are also highly intelligent, as evidenced by research with language-trained apes such as Kanzi (Savage-Rumbaugh et al., 1998). Furthermore, chimps and bonobos produce short bouts of rhythmic “drumming” with their hands or feet as part of display or play behavior (Arcadi et al., 1998; Fitch, 2006; Kugler & Savage-Rumbaugh, 2002), meaning that they can voluntarily produce rhythmic movements on a timescale appropriate for synchronization to a beat.
尽管存在这些事实,但仍有理由质疑类人猿(以及一般的非人类灵长类动物)是否能够与节拍同步移动。这些原因与参与节拍感知和运动控制的大脑回路有关。使用 fMRI 对人类进行的感知研究表明,有规律节拍(相对于没有规律节拍)与基底神经节活动增加有关 (Grahn & Brett, 2007)。众所周知,这种深部大脑结构是分布回路(涉及大脑皮层、基底神经节和丘脑)的重要组成部分,涉及间隔计时,换句话说,在与音乐节拍感知相关的时间范围内测量时间间隔(马泰尔和梅克,2000 年)。重要的是,基底神经节也参与运动控制和排序(参见 Janata & Grafton,2003),
Despite these facts, there are reasons to question whether apes (and non-human primates in general) are capable of moving in synchrony with a beat. These reasons pertain to the brain circuits that are involved in beat perception and motor control. Perceptual research on humans using fMRI indicates that rhythms that do (vs. do not) have a regular beat are associated with increased activity in the basal ganglia (Grahn & Brett, 2007). This deep brain structure is known to be an essential part of the distributed circuit (involving the cerebral cortex, basal ganglia, and thalamus) involved in interval timing, in other words, in gauging temporal intervals in the time range relevant to musical beat perception (Matell & Meck, 2000). Importantly, the basal ganglia are also involved in motor control and sequencing (cf. Janata & Grafton, 2003), meaning that a brain structure involved in perceptually “keeping the beat” is also involved in the coordination of patterned movement.
如果与节拍同步只需要一个共同的大脑结构参与间隔计时和运动控制,那么人们会期望黑猩猩(和许多其他动物)能够做到这一点。这是因为基底神经节在包括灵长类动物和啮齿动物在内的多种物种中支持间隔时间和运动控制功能 (Buhusi & Meck, 2005)。然而,我怀疑随着节奏的变化需要的不仅仅是处理这两种功能的共同大脑结构。这是因为与节拍的同步涉及听觉之间的特殊关系时间间隔和有规律的运动,视觉节律很难诱导人类的同步运动这一事实证明了这一点(Patel,Iversen 等,2005)。然而,基底神经节的间隔计时能力是无模态的,同样适用于由听觉事件和视觉事件定义的间隔。这表明人类进化中的一些额外力量以一种在听觉输入和运动输出之间提供紧密耦合的方式改变了基底神经节。
If synchronizing to a beat simply required that a common brain structure be involved in interval timing and motor control, then one would expect that chimps (and many other animals) would be capable of this behavior. This is because the basal ganglia subserve interval timing and motor control functions across a wide range of species, including primates and rodents (Buhusi & Meck, 2005). However, I suspect moving to a beat requires more than just a common brain structure that handles both of these functions. This is because synchronization to a beat involves a special relationship between auditory temporal intervals and patterned movement, as evidenced by the fact that visual rhythms poorly induce synchronized movement in humans (Patel, Iversen, et al., 2005). Yet the interval timing abilities of the basal ganglia are amodal, applying equally well to intervals defined by auditory versus visual events. This suggests that some additional force in human evolution modified the basal ganglia in a way that affords a tight coupling between auditory input and motor output.
这种进化力量的一个似是而非的候选者是发声学习。声乐学习涉及学习根据听觉体验和感官反馈产生声音信号。这种能力对我们来说似乎是司空见惯的,因为每个孩子都会在学习说话时表现出这种能力。然而,从进化的角度来看,发声学习是一种罕见的特征,只出现在少数动物群体中(包括鸣禽、鹦鹉、鲸类和一些鳍足类动物;参见 Fitch,2006 年;Merker,2005 年和第 7.2 节。 3 ). 值得注意的是,人类在表现出复杂的发声学习方面在灵长类动物中是独一无二的(Egnor & Hauser,2004)。
One plausible candidate for this evolutionary force is vocal learning. Vocal learning involves learning to produce vocal signals based on auditory experience and sensory feedback. This ability seems commonplace to us, because every child exhibits it as part of learning to speak. An evolutionary perspective, however, reveals that vocal learning is an uncommon trait, having arisen in only a few groups of animals (including songbirds, parrots, cetaceans, and some pinnipeds; cf. Fitch, 2006; Merker, 2005, and section 7.2.3). Notably, humans are unique among primates in exhibiting complex vocal learning (Egnor & Hauser, 2004).
发声学习需要听觉输入和运动输出之间的紧密耦合,以便将发声产生与所需模型相匹配。这种听觉和运动系统的在线整合对神经系统提出了特殊要求。对鸟类的神经生物学研究表明,发声学习与基底节的改变有关,基底节在学习过程中调节听觉输入和运动输出之间的联系中起着关键作用(Doupe 等人,2005 年)。由于鸟类和哺乳动物的基底神经节解剖学之间存在许多解剖学相似之处,因此似乎有理由认为人类基底神经节也已通过声音学习的自然选择进行了修改(参见 Jarvis,2004)。由此产生的听觉输入和运动输出之间的紧密耦合可能是同步节拍的必要基础。
Vocal learning requires a tight coupling between auditory input and motor output in order to match vocal production to a desired model. This online integration of the auditory and motor system places special demands on the nervous system. Neurobiological research on birds indicates that vocal learning is associated with modifications to the basal ganglia, which play a key role in mediating a link between auditory input and motor output during learning (Doupe et al., 2005). Because there are many anatomical parallels between basal ganglia anatomy in birds and mammals, it seems plausible to suggest that human basal ganglia have also been modified by natural selection for vocal learning (cf. Jarvis, 2004). The resulting tight coupling between auditory input and motor output may be a necessary foundation for synchronizing to a beat.
上述观察可以浓缩成一个具体且可检验的假设,即具有用于复杂声乐学习的神经回路是与听觉节拍同步的能力的必要先决条件。这种“声音学习和节奏同步假说”预测,教非人类灵长类动物与节拍同步的尝试不会成功。23 , 24此外,它表明,如果灵长类动物确实无法与节拍同步,那么就断定这种能力是人类独有的还为时过早。测试非人类声乐学习者的这种能力至关重要。在这方面,值得注意的是,有传闻称鹦鹉会响应音乐而有节奏地移动(声音/视频示例 7.4;参见 Patel,2006b)。25如果未来的研究表明人类在学习与音乐节拍同步移动方面是独一无二的,这将暗示音乐的自然选择形成了电路。
The foregoing observations can be condensed into a specific and testable hypothesis, namely that having the neural circuitry for complex vocal learning is a necessary prerequisite for the ability to synchronize with an auditory beat. This “vocal learning and rhythmic synchronization hypothesis” predicts that attempts to teach nonhuman primates to synchronize to a beat will not be successful.23,24 Furthermore, it suggests that if primates do fail at synchronizing to a beat, it would be premature to conclude that this ability is unique to humans. It would be essential to test nonhuman vocal learners for this ability. In this regard, it is interesting to note that there are anecdotal reports of parrots moving rhythmically in response to music (Sound/Video Example 7.4; cf. Patel, 2006b).25 If future research demonstrates that humans are unique in being able to learn to move in synchrony with a musical beat, this would be suggestive of circuits shaped by natural selection for music.
人类的身体和大脑是否被音乐的自然选择所塑造是一个激烈争论的话题。为了解决这个问题,本章曾用语言作为音乐的陪衬。就语言而言,似乎有足够的证据可以拒绝零假设,即人类并未因这种能力而直接受到自然选择的影响。然而,就音乐而言,我认为还没有积累足够的证据来拒绝零假设。然而,我赶紧补充说,这并不意味着问题已经解决。需要进一步的研究来解决人类是否通过进化被特殊塑造以获得音乐能力。一个特别值得研究的方向是关注基于节拍的节奏处理,以及这在多大程度上代表了特定领域、先天和独特的人类能力。
Whether human bodies and brains have been shaped by natural selection for music is a topic of vigorous debate. To address this question, this chapter has used language as a foil for music. In the case of language, there appears to be enough evidence to reject the null hypothesis that humans have not been directly shaped by natural selection for this ability. In the case of music, however, I do not think enough evidence has accumulated to reject the null hypothesis. I hasten to add, however, that this does not mean the question is settled. Further research is needed to address whether humans have been specifically shaped by evolution to acquire musical abilities. One line of research especially worth pursuing concerns beat-based rhythmic processing, and the extent to which this represents a domain-specific, innate, and uniquely human ability.
无论这项研究的结果如何,都应该牢记,认为某物是生物适应的产物或虚饰的观点是基于错误的二分法(第 7.4 节)。音乐可能是人类的发明,但如果是这样的话,它就像生火和控制火的能力:它是我们发明的改变人类生活的东西。事实上,它在某些方面比生火更了不起,因为它不仅是我们大脑心智能力的产物,而且还具有改变大脑的力量。因此,它象征着我们物种改变自身本性的独特能力。
Whatever the outcome of this research, it should be kept in mind that the notion that something is either a product of biological adaptation or a frill is based on a false dichotomy (section 7.4). Music may be a human invention, but if so, it resembles the ability to make and control fire: It is something we invented that transforms human life. Indeed, it is more remarkable than fire making in some ways, because not only is it a product of our brain’s mental capacities, it also has the power to change the brain. It is thus emblematic of our species’ unique ability to change the nature of ourselves.
这是第 7.5.2 节的附录。
This is an appendix to section 7.5.2.
Hannon 和 Trehub (2005) 对音乐节拍感知进行了一项有趣的研究。他们让成人和 6 个月大的婴儿熟悉两种不同的音乐片段,这些片段具有丰富的旋律和节奏结构。第一种有基于等时节拍的拍子(这在西欧音乐中很典型),而第二种有基于非等时节拍的拍子(具体来说,节拍形成重复的短-短-长模式,这在巴尔干音乐中很常见; 参见第 3章3.2.1 节). 成年人被要求根据这些模式与原始模式的节奏相似性对每种模式的变化进行评分。对于每个模式(等时或非等时仪表),通过在模式的一个小节中插入一个音符来创建两个变体。在一个变体中,插入是以保留韵律模式的方式进行的,而在另一个变体中,插入打乱了韵律(导致一个小节多了 1/2 节拍)。这些分别称为“结构保留”(SP) 和“结构违反”(SV) 模式。
Hannon and Trehub (2005) conducted an interesting study of musical meter perception. They familiarized adults and 6-month-old infants with two different kinds of musical excerpts that had a rich melodic and rhythmic structure. The first kind had a meter based on isochronous beats (as is typical in Western European music), whereas the second kind had a meter based on nonisochronous beats (specifically, the beats formed a repeating short-short-long pattern, common in Balkan music; cf. Chapter 3, section 3.2.1). The adults were asked to rate variations of each of these patterns for their rhythmic similarity to the original pattern. For each pattern (isochronous or nonisochronous meter), two variants were created by inserting a note into one of the measures of the pattern. In one variant, the insertion was made in a way that preserved the metrical pattern, whereas in the other variant, the insertion disrupted the meter (resulting in one measure with an extra 1/2 beat). These were called the “structure preserving” (SP) and “structure violating” (SV) patterns, respectively.
对于北美成年人,主要发现是在等时计条件下,SV 模式与熟悉刺激的相似性低于 SP 模式,但在非等时计条件下则不然。换句话说,当节奏基于均匀节拍时,成年人似乎更容易发现违反现行节拍的节奏变化,而不是基于不均匀节拍的节拍。这表明成年人有困难从巴尔干音乐中提取不均匀的节拍,并将它们用作感知时间模式的框架。(这一解释得到了巴尔干成年人也接受了测试这一事实的支持,并且在等时和非等时仪表中将 SV 评为比 SP 差。)
For North American adults, the key finding was that the SV pattern was rated less similar to the familiarization stimulus than the SP pattern for the isochronous meter condition, but not for the nonisochronous meter condition. In other words, adults seemed to find it easier to detect a rhythmic change that violated the prevailing meter when that meter was based on evenly timed beats than on unevenly timed beats. This suggests that the adults had difficulty extracting the unevenly timed beats from Balkan music and using them as a framework for the perception of temporal patterns. (This interpretation was supported by the fact that Balkan adults were also tested, and rated SV as worse than SP for both isochronous and nonisochronous meters.)
本研究中的婴儿(全部来自北美)接受了熟悉偏好程序的测试。在熟悉等时或非等时计的摘录后,他们在交替试验中播放相同模式的 SV 和 SP 变体,并量化他们看这两种变体的时间。对 SV 变体的较长观察时间被视为他们将 SV 与 SP 区分开来的指示。有趣的主要发现是,对于等时和非等时仪表,婴儿对 SV 与 SP 模式的观察时间更长(表明对新奇事物的偏好)。换句话说,婴儿似乎同样能很好地区分等时和非等时韵律模式的变化。这一发现让人想起言语感知发展的经典发现,第 2 章,第 2.3.4 节)。
The infants in this study (all North American) were tested with a familiarization-preference procedure. After being familiarized with an excerpt in either the isochronous or nonisochronous meter, they were played the SV and SP variant of that same pattern in alternating trials, and their looking time to the two variants was quantified. A longer looking time to the SV variant was taken as an indication that they discriminated SV from SP. The main finding of interest was that the infants showed greater looking time to SV versus SP patterns for both the isochronous and nonisochronous meters (suggestive of a novelty preference). In other words, infants seemed to discriminate changes in isochronous and nonisochronous metrical patterns equally well. This finding is reminiscent of classic findings in speech perception development, showing that 6-month-olds can discriminate both native and nonnative phoneme contrasts, in contrast to adults, who have difficulty discriminating nonnative contrasts (cf. Chapter 2, section 2.3.4).
很容易将这项研究解释为婴儿感知音乐节拍的证据,换句话说,他们根据复杂音乐模式中节拍的时间形成了结构化时间预期的心理框架,并且他们同样擅长做这与等时和非等时仪表。然而,对结果还有其他可能的解释。尽管可以合理地假设成年人尽可能提取节拍,但婴儿可能没有提取规则的节拍,而是注意连续音符之间的非周期性持续时间比率模式。这种模式在两个仪表的 SV 和 SP 条件下都发生了变化,因此可以作为观察到的新奇偏好的基础。
It is tempting to interpret this study as evidence that infants perceive a beat in music, in other words, that they form a mental framework of structured temporal anticipation based on the timing of beats in complex musical patterns, and that they are equally adept at doing this with isochronous and nonisochronous meters. However, there are other possible explanations of the results. Although it is reasonable to assume the adults extracted a beat whenever they could, it is possible that the infants did not extract regular beats and attended instead to nonperiodic duration ratio patterns between successive notes. Such patterns changed in both the SV and SP conditions in both meters, and could thus have served as a basis for the observed novelty preference. Thus the infants’ responses do not necessarily indicate the perception of a regular beat in terms of a framework of periodic temporal anticipation.
证明婴儿真正的仪表感知的一种方法是表明他们从沿其他维度变化的刺激中提取仪表作为特征。Hannon 和 Johnson(2005)对此进行了相关研究。他们使用了一种习惯范式,让 7 个月大的婴儿听复杂的节奏模式,这些模式在他们的时间结构上有所不同,但这些都是为了产生一种常识。四种不同的刺激被用来唤起双重节拍(每隔一个事件感知节拍,每四个事件有一个强烈的节拍),而其他四种刺激用于唤起三重节拍(每三个事件感知节拍)。节奏模式不包含任何物理重音:所有音调都具有相同的强度和频率(C5,523 Hz)。因此,由于事件在组中的位置,唯一的重音是“主观重音”,而韵律是由这些主观重音模式的时间结构创建的 (Povel & Essens, 1985)。(训练有素音乐家们确认第一组模式更适合双拍子,第二组更适合三拍子。)
One way to demonstrate true meter perception in infants would be to show that they extract meter as a feature from stimuli that vary along other dimensions. Hannon and Johnson (2005) conducted a relevant study in this regard. They used a habituation paradigm in which 7-month-old infants listened to complex rhythmic patterns that differed in their temporal structure, but that were all designed to yield a common sense of meter. Four different stimuli were used to evoke a duple meter (a perceived beat on every other event, with a strong beat every four events), whereas four other stimuli they used to evoke a triple meter (a perceived beat on every third event). The rhythmic patterns did not contain any physical accents: All tones were of the same intensity and frequency (C5, 523 Hz). Thus the only accents were “subjective accents” due to the position of events in groups, with the meter being created by the temporal structure of these subjective accent patterns (Povel & Essens, 1985). (Trained musicians confirmed that the first set of patterns fit better with a duple meter and the second set with a triple meter.)
婴儿会反复听到一个类别中的三种模式,直到他们习惯(根据观看时间)或经过固定次数的试验。之后,他们接受了一项测试,其中两种新节奏交替出现:一种来自同一格律类别,另一种来自其他格律类别。关键问题是婴儿是否会表现出一些证据表明已经提取了习惯刺激的计量表。Hannon 和 Johnson 发现婴儿在使用新型计量表的刺激物上看的时间更长,并认为这为提取计量结构提供了证据。他们认为,提取韵律的早熟能力可能有助于婴儿学习音乐的其他方面,例如调性,因为结构上重要的音高往往出现在韵律强的节拍上。26
Infants heard three of the patterns from one category repeatedly until they habituated (based on looking time) or a fixed number of trials elapsed. After this, they were presented with a test in which two novel rhythms alternated: one from the same metrical category and one from the other metrical category. The key question was whether infants would show some evidence of having extracted the meter of the habituation stimuli. Hannon and Johnson found that infants looked longer at the stimuli with the novel meter, and argued that this provides evidence for the extraction of metrical structure. They suggest that the precocious ability to extract meter may help the infant to learn other aspects of music such as tonality, because structurally important pitches tend to occur on metrically strong beats. They draw an analogy to the way that speech rhythm may help infants learn aspects of linguistic structure, such as the location of word boundaries in running speech (Cutler, 1994).26
这项研究因其优雅的设计、明确的结果和强有力的主张而引人注目。然而,结果的解释再次存在一些不确定性。值得注意的是,二重音和三重音模式的分组结构存在差异,因此结果可以反映出基于分组而非韵律的新颖偏好(正如作者所指出的)。为了控制这一点,该研究包括了第二个实验,使用相同的习惯刺激,但设计了新的测试刺激来消除这种混淆。具体而言,有两种测试模式:一种具有与二拍子模式一致的重音但类似于三拍子模式的分组结构,而另一种具有与三拍子一致的重音但类似于二拍子模式的分组结构。和第一个实验一样,婴儿熟悉双重或三重模式,然后对这两种新模式进行偏好测试。婴儿用新仪表看图案的时间更长。虽然这似乎是对实验 1 中发现的直接确认,但对这些结果还有另一种可能的解释。正如作者所指出的,创建具有新颖计量表但熟悉分组结构的测试刺激导致模式不像习惯模式那样具有强烈的计量线索。因此,预计识别新型仪表将很困难。虽然这似乎是对实验 1 中发现的直接确认,但对这些结果还有另一种可能的解释。正如作者所指出的那样,创建具有新颖仪表但熟悉分组结构的测试刺激导致模式不像习惯模式那样具有强烈的仪表提示。因此,预计识别新型仪表将很困难。虽然这似乎是对实验 1 中发现的直接确认,但对这些结果还有另一种可能的解释。正如作者所指出的,创建具有新颖计量表但熟悉分组结构的测试刺激导致模式不像习惯模式那样具有强烈的计量线索。因此,预计识别新型仪表将很困难。
This study is noteworthy because of its elegant design, clear results, and strong claims. However, once again there are some uncertainties about the interpretation of the results. Notably, there were differences in the grouping structure of patterns in duple and triple meters, so that the results could reflect a novelty preference based on grouping rather than meter (as noted by the authors). To control for this, the study included a second experiment with the same habituation stimuli but new test stimuli designed to eliminate this confound. Specifically, there were two test patterns: One had accents consistent with a duple meter but a grouping structure that resembled the triple meter patterns, whereas the other had accents consistent with a triple meter but a grouping structure resembling the duple meter patterns. As in the first experiment, infants were familiarized with either the duple or triple meter patterns and then given a preference test for the two novel patterns. Infants looked longer at the pattern with the novel meter. Although this seems like a straightforward confirmation of the findings in experiment 1, there is another possible interpretation of these results. As noted by the authors, creating test stimuli that had a novel meter but a familiar grouping structure resulted in patterns that did not have as strong cues to meter as did the habituation patterns. Thus recognizing a novel meter would be expected to be difficult.
这引起了关注,即观察到的观看时间不对称反映了对旧分组结构的熟悉偏好,而不是对新仪表的新颖偏好。这种解释假定实验 1 中的婴儿表现出对新分组结构的新奇偏好,而实验 2 中的婴儿表现出对旧分组结构的熟悉偏好。乍一看,这似乎难以置信,但这种可能性Saffran 及其同事的发现提出了响应模式的变化。这些研究人员让婴儿熟悉一个音调序列,然后测试他们从这个序列中区分熟悉和不熟悉片段的能力(也使用注视时间作为因变量)。尽管婴儿在第一项研究中表现出新奇偏好(Saffran 等人,1999),但他们在使用类似范式的第二项研究中表现出熟悉偏好(Saffran 等人,2005)。因此,由于尚不清楚的原因,研究之间可能会在新颖性和熟悉性偏好之间发生翻转。
This raises the concern that the observed asymmetry in looking time reflected a familiarity preference for the old grouping structure, rather than a novelty preference for the new meter. This explanation posits that infants in experiment 1 showed a novelty preference for a new grouping structure, whereas infants in experiment 2 showed a familiarity preference for an old grouping structure. At first glance, this may seem implausible, but the possibility of such a change in response pattern is raised by findings by Saffran and colleagues. These researchers familiarized infants with a tone sequence and then tested their ability to discriminate familiar versus unfamiliar fragments from this sequence (also using looking time as a dependent variable). Although infants showed a novelty preference in a first study (Saffran et al., 1999), they showed a familiarity preference in a second study that used a similar paradigm (Saffran et al., 2005). Hence flips between novelty and familiarity preference can occur between studies for reasons that are not well understood.
因此,需要做更多的工作来解决汉农和约翰逊研究中的婴儿是基于新奇还是熟悉做出反应的问题。即使这个问题可以得到解决,并且可以确信婴儿在新的韵律模式上看的时间更长,但问题仍然是使这些模式新颖的线索是否实际上是韵律的(例如,每三个事件而不是每两个事件的周期性节拍) , 或者如果它们是由于可能与新颖的仪表混淆的本地提示 (例如持续时间比率和事件密度)。这是值得一问的,因为使用局部节奏线索与过去关于婴儿节奏感知技能的理论是一致的 (Drake, 1998)。
Thus more work is needed to resolve the issue of whether the infants in Hannon and Johnson’s study reacted based on novelty or familiarity. Even if this question can be resolved and one can be confident that infants looked longer at novel metrical patterns, the question remains whether the cues that made these patterns novel were in fact metrical (e.g., periodic beats every three events instead of every two events), or if they were due to local cues (such as duration ratios and event density) that may have been confounded with the novel meter. This is worth asking because the use of local rhythmic cues would be consistent with past theorizing about infants’ rhythm perception skills (Drake, 1998).
Phillips-Silver 和 Trainor (2005) 采用了一种不同的婴儿仪表感知方法。他们给 7 个月大的婴儿播放了一种节奏,该节奏由一系列没有肢体口音的等时敲击声(大约每秒 3 声)组成。该模式使用了两种不同类型的打击乐声音(小军鼓和闹剧),其排列方式旨在区分二拍子和三拍子(每秒或每三次事件的重音)。在这个熟悉阶段,一半的婴儿每听到第二个声音就会弹跳一次,而另一半则每听到三次声音就会弹跳一次。紧接着,婴儿接受了一项偏好测试,根据测试他们听到两种版本的节奏,一种在每隔一个事件上有身体(强度)重音,另一个在每隔三个事件上有重音。婴儿通过他们的注视方向控制每个版本的播放时间,并表现出更喜欢听到与他们自己的运动体验相匹配的版本。这显示了节奏感知中运动和听觉的跨模态整合(这是研究的重点)。对一些人来说,这也可能表明婴儿感知了米,这需要基于周期性时间预期的节拍感知。然而,这不一定是这种情况。观察到的偏好可能基于分组而不是韵律,因为每秒钟与第三个事件的重音会创建包含两个与三个元素的块。这里的一个关键问题是事件和重音的规则时间是否对跨模态泛化很重要,或者被泛化的是块大小(元素数量)。
A different approach to infant meter perception was taken by Phillips-Silver and Trainor (2005). They played 7-month-old infants a rhythm consisting of a sequence of isochronous percussive sounds (about three per second) with no physical accents. The pattern used two different kinds of percussive sounds (snare drum and slapstick) arranged in a fashion that was meant to be ambiguous between a duple and a triple meter (an accent every second or every third event). During this familiarization phase, half the infants were bounced on every second sound, whereas the other half were bounced on every third sound. Immediately thereafter, infants were given a preference test based in which they heard two versions of the rhythm, one that had physical (intensity) accents on every other event, and one with accents on every third event. The infants controlled how long each version played by their direction of gaze, and showed a preference for hearing the version that matched their own movement experience. This shows a cross-modal integration of movement and audition in rhythm perception (which was the focus of the study). To some, it may also suggest that infants perceived meter, which entails beat perception based on periodic temporal expectancies. However, this need not be the case. The observed preference might be based on grouping rather than meter, because an accent every second versus third event creates chunks with two versus three elements. A critical question here is whether the regular timing of the events and accents is important to the cross-modal generalization, or whether what is being generalized is chunk size (in number of elements).
总而言之,旨在研究节拍感知的婴儿节律感知研究需要证明婴儿并不是简单地根据线索做出反应,例如事件的绝对持续时间或事件之间的间隔、持续时间比率模式、分组或事件密度.
To summarize, studies of infant rhythm perception that aim to study beat perception need to demonstrate that infants are not simply responding on the basis of cues such as the absolute durations of events or of intervals between events, duration ratio patterns, grouping, or event density.
1是否可以训练动物产生或欣赏音乐将在本章后面讨论。
1 Whether or not animals can be trained to produce or appreciate music is dealt with later in this chapter.
2绝对音高的遗传学及其与进化论点的相关性将在本章后面的第 7.3.4 节中讨论。
2 The genetics of absolute pitch and its relevance to evolutionary arguments are dealt with later in this chapter, in section 7.3.4.
3当然,对于语言背后的选择压力还有其他建议,包括语言最初用于促进社会群体之间的联系,而这些社会群体对于传统的灵长类动物梳理策略来说太大了(Dunbar,2003 年)。语言进化背后的精确选择压力不是这里的重点。
3 There are of course alternative proposals for the selective pressure behind language, including the idea that language originally served to facilitate bonding in social groups that were too large for traditional primate grooming strategies (Dunbar, 2003). The precise selective pressures behind language evolution are not the focus here.
4当然,牙牙学语一旦开始,就会很快受到环境输入的影响。例如,有听力的婴儿在第一年开始表现出母语语调影响的迹象 (Whalen et al., 1991),而牙牙学语的数量和复杂性会受到照顾者的社会反馈的调节 (Goldstein et al. , 2003).
4 Of course, once babbling has commenced, it is quickly influenced by environmental input. For example, hearing babies begin to show signs of the influence of their native language’s intonation in their first year (Whalen et al., 1991), and the quantity and complexity of babbling is modulated by social feedback from their caregiver (Goldstein et al., 2003).
5第二个较小的下降发生在青春期的男性中(Fitch & Giedd,1999)。参见 Ohala (1984) 讨论这种两性差异的进化意义。
5 A second, smaller descent occurs in males at puberty (Fitch & Giedd, 1999). See Ohala (1984) for a discussion of the evolutionary significance of this difference between the sexes.
6请注意,虽然语言不需要语音(手语是完整的人类语言这一事实证明;Klima & Bellugi,1979 年;Emmorey,2002 年),语音是语言的主要渠道,因此语音的形态专业化也是证据语言的选择。
6 Note that although language does not require speech (as evidenced by the fact that sign languages are full human languages; Klima & Bellugi, 1979; Emmorey, 2002), speech is a primary channel for language and thus morphological specializations for speech are also evidence of selection for language.
7长臂猿是类人猿,以其令人印象深刻的歌声和雌雄二重唱而闻名,但与鸣禽不同的是,它们似乎并不学习唱歌(Geissmann,2000 年)。
7 Gibbons are great apes well known for their impressive singing and their male-female duets, but in contrast to songbirds, they do not appear to learn their song (Geissmann, 2000).
8依赖性测量是发声时间 (VOT),即从停止辅音释放到元音开始发声之间的时间。
8 The dependent measure was voice-onset time (VOT), the time between the release of a stop consonant to the onset of voicing for the vowel.
9对关键期的研究表明,尽管接触语言的年龄对语法和音韵学(即“母语口音”;Piske 等人,2001 年)有显着影响,但对习得没有显着影响词汇量(纽波特,2002 年)。因此,语言的关键期似乎比语义方面更能影响语言的形式、组合方面。
9 Studies of the critical period have shown that although age of exposure to language has a salient influence on grammar and on phonology (i.e., “native accent”; Piske et al., 2001), it does not have a pronounced effect on the acquisition of vocabulary (Newport, 2002). Thus the critical period for language appears to influence the formal, combinatorial aspects of language more than the semantic aspects.
10 “CODA”代表“聋人的孩子”,现在是一个组织良好的社区。请参阅http://www.coda-international.org。
10 “CODA” stands for “children of deaf adults,” now a well-organized community. See http://www.coda-international.org.
11旋律在每次重复时被移调到三个键中的一个。连续变调有助于将注意力集中在相对而非绝对音调提示上,并建立在婴儿认识到变调中呈现的旋律相似性这一事实的基础上(Cohen 等人,1987)。
11 The melody was transposed on each repetition to one of three keys. The continuous transposition served to focus attention on relative rather than absolute pitch cues, and built on the fact that infants recognize the similarity of melodies presented in transposition (Cohen et al., 1987).
12扩散张量成像 (DTI)。
12 Diffusion tensor imaging (DTI).
13在提出这个建议时,我当然略过了如何匹配音乐和语言输入的棘手问题。简单地计算孩子每天听到语音和音乐的分钟数可能并不令人满意,特别是如果口头输入涉及社交互动但音乐输入是被动地听 CD,因为社交互动有助于人类学习语音(Kuhl 等等人,2003 年)。然而,我不认为这个问题是无法克服的,特别是如果可以与热爱音乐的父母一起工作,他们会喜欢与孩子进行音乐互动。在家练习/工作的音乐家或音乐教师可能是此类研究的良好人选。
13 In making this suggestion, I am of course skimming over the thorny issue of how to match musical and linguistic input. Simply counting the number of minutes per day a child hears speech versus music may not be satisfactory, especially if the spoken input involves social interaction but the musical input is passively listening to CD, because social interaction facilitates learning of speech sounds in humans (Kuhl et al., 2003). However, I do not think this problem is insurmountable, particularly if one can work with music-loving parents who would enjoy musical interactions with their children. Musicians or music teachers who practice/work at home may be good candidates for such studies.
14另一个可以用来说明人类基因型-表型相互作用在语言能力方面的复杂性的情况是威廉姆斯综合症,它是由 7 号染色体特定区域的偶然基因缺失引起的(Bellugi 和 St. George,2001 年;Karmiloff-Smith等人,2003 年;Levitin 等人,2004 年)。
14 Another condition that could have been used to illustrate the complexity of human genotype-phenotype interactions in language abilities is Williams syndrome, which results from chance genetic deletion in a specific region of chromosome 7 (Bellugi & St. George, 2001; Karmiloff-Smith et al., 2003; Levitin et al., 2004).
15与大多数其他多细胞生物一样,人类携带每个基因的两个副本,每个染色体一个,男性 Y 染色体的部分除外。
15 Like most other multicellular organisms, humans carry two copies of each gene, one per chromosome, except on portions of the male Y chromosome.
16有趣的是,FOXP2 编码区的 DNA 序列在鸟类学习者和非学习者中非常相似(Webb 和 Zhang,2005 年),尽管该基因的表达模式在这两种鸟类的大脑中非常不同。这表明对该基因的重要进化修饰发生在其调节区域而不是其编码区域。监管变化可能影响了基因在发育过程中表达的时间和数量,这对表达它的大脑回路的功能特性有重要影响。
16 Interestingly, the DNA sequence in the coding region of FOXP2 is very similar in avian learners and nonlearners (Webb & Zhang, 2005), even though the pattern of expression of this gene is very different in the brains of these two kinds of birds. This suggests that important evolutionary modifications to this gene occurred in its regulatory region rather than in its coding region. Regulatory changes may have influenced the timing and amount of the gene’s expression during development, which had important effects on the functional properties of brain circuits in which it is expressed.
17就我个人而言,我不相信培训差异已被排除。即使亚洲和西方学生参加相同的培训项目,在练习时应该练习多少和重点练习什么技能方面的态度也可能存在差异(例如,由于文化背景的差异)。我感谢 Roger V. Burton 提出这一点。
17 Personally, I am not persuaded that differences in training have been ruled out. Even if Asian and Western students participate in the same training programs, there could be differences in attitudes about how much to practice and what skills to focus on during practice (e.g., due to differences in cultural background). I am grateful to Roger V. Burton for raising this point.
18调性和无调性旋律是 7 个音符序列,其调性程度使用 Takeuchi(1994 年)的最大键剖面相关性 (MKC) 方法量化,该方法基于 Krumhansl 和 Kessler(1982 年)的工作。无调性旋律的 MKC 值小于 0.57,而调性旋律的 MKC 值为 0.75 或更高。
18 The tonal and atonal melodies were 7-note sequences whose degree of tonality was quantified using the Maximum Key Profile Correlation (MKC) method of Takeuchi (1994), based on the work of Krumhansl and Kessler (1982). Atonal melodies had MKC values of less than 0.57, whereas tonal melodies had a value of 0.75 or above.
19人们不希望在持续不断的白噪声中饲养动物,因为这会导致皮质听觉图异常(Chang 和 Merzenich,2003 年)。
19 One would not want to raise the animals in constant white noise, as this can lead to abnormal cortical auditory maps (Chang & Merzenich, 2003).
20我们目前缺乏有关儿童能够可靠地与节拍同步的最小年龄的数据。该领域需要进一步的开发工作。在儿童中有活跃的音乐文化的文化中进行的实验在这里会特别令人感兴趣(Blacking,1967)。
20 We currently lack data on the youngest age at which children are capable of reliably synchronizing to a beat. Further developmental work is needed in this area. Experiments in cultures in which there is an active musical culture among children would be of particular interest here (Blacking, 1967).
21此外,其中一种具有规则口音的模式并未显示出优于具有不规则口音的模式的处理优势,这表明除了主观口音的规律性之外,其他因素也在这些结果中起作用。
21 Furthermore, one of the patterns with regular accents did not show a processing advantage over the pattern with irregular accents, suggesting that other factors besides the regularity of subjective accents is at play in these results.
22作者于 2006 年 10 月在泰国南邦府的泰国大象保护中心拍摄的视频,感谢 David Sulzer 和 Richard Lair 的盛情款待。请注意,穿蓝色衣服的驯象师站在大象的右边和后面,没有给出任何口头、视觉或触觉计时提示。这头大象是一名 13 岁的雌性,名叫普拉蒂达。
22 Footage taken by the author in October 2006, in Lampang, Thailand, at the Thai Elephant Conservation Center, thanks to the kind hospitality of David Sulzer and Richard Lair. Note that the mahout (trainer), who is in blue and stands to the right and behind the elephant, is not giving any verbal, visual, or tactile timing cues. The elephant is a 13-year-old female named Pratida.
23有一个关于雌性白手长臂猿 ( Hylobates lar)的旧报告在德国动物园中,它跟随节拍器的节拍发出短促的叫声(Ziegler & Knobloch,1968 年,引用于 Geissmann,2000 年)。然而,盖斯曼(个人通讯)认为,长臂猿很可能是在节拍器滴答声之间的间隔中呼唤。这可能是一种刺激-反应模式,正如在某些青蛙身上观察到的那样,而不是基于结构化时间预期的行为(参见 Gerhardt & Huber,2002,第 8 章)。就像在青蛙研究中一样,关键的测试是操纵滴答声的时间,使它们以相同的平均节奏出现,但时间间隔不规则。如果长臂猿在每次滴答声后以较短的潜伏期呼叫,这表明存在刺激-反应模式,因为动物无法暂时预测滴答声何时发生。
23 There is an old report of a female white-handed gibbon (Hylobates lar) in a German zoo that followed the beats of a metronome with short calls (Ziegler & Knobloch, 1968, cited in Geissmann, 2000). However, Geissmann (personal communication) believes that the gibbon was likely calling in the intervals between the metronome ticks. This could be a stimulus-response pattern, as observed in certain frogs, rather than a behavior based on structured temporal anticipation (cf. Gerhardt & Huber, 2002, Ch. 8). As in studies of frogs, the critical test would be to manipulate the timing of the ticks so that they occurred at the same average tempo but with temporally irregular intervals. If the gibbon calls with short latency after each tick, this suggests a stimulus-response pattern because the animal cannot temporally anticipate when the ticks occur.
24 Mithen (2005:153) 也预测非人类灵长类动物将无法与音乐节拍同步,但原因与此处建议的不同。
24 Mithen (2005:153) also predicts that nonhuman primates will not able to synchronize to a musical beat, but for reasons different from those suggested here.
25声音/视频示例 7.4 是电影The Wild Parrots of Telegraph Hill的节选(经 Pelican Media 许可复制),它描述了一群在旧金山野外生活的鹦鹉和帮助照顾它们的人之间的关系(马克·比特纳)。在这个片段中,马克·比特纳 (Mark Bittner) 弹吉他,而鸟群中的一只鹦鹉,名叫明格斯 (Mingus),随着音乐有节奏地移动。虽然 Bittner 自己在这个片段中有节奏地移动,这引发了鹦鹉是否在模仿他自己的动作的问题,但他报告说(个人交流)他通常坐着不动弹吉他,并且首先注意到 Mingus 在这种情况下的节奏动作。
25 Sound/Video Example 7.4 is an excerpt from the film The Wild Parrots of Telegraph Hill (reproduced with permission from Pelican Media), which describes the relationship between a flock of parrots that live wild in San Francisco and man who helps care for them (Mark Bittner). In this clip, Mark Bittner plays guitar while one of the parrots from the flock, named Mingus, moves rhythmically to the music. While Bittner himself moves rhythmically in this clip, raising the question of whether the parrot is imitating his own movements, he reports (personal communication) that he normally plays guitar while sitting still, and first noticed Mingus’s rhythmic movements in that context.
26 Hannon 和 Johnson 并没有声称说话节奏是周期性的,只是说两个领域中有组织的时间和口音模式可能有助于引导学习结构的其他方面。
26 Hannon and Johnson make no claim that speech rhythm is periodic, simply that organized patterns of timing and accent in both domains may help bootstrap learning of other aspects of structure.
本书从多个角度探讨了音乐与语言的关系。这些探索表明,音乐和语言应该被视为子过程的复杂组合,其中一些是共享的,而另一些则不是。在许多情况下,这些链接乍一看并不明显。然而它们就在那里,而且比人们普遍认为的要深入得多。探索这种关系网络,同时适当注意相似点和不同点,可以提高我们对思维如何从基本认知过程中组装出复杂的交流能力的理解。
This book has explored music-language relations from a variety of perspectives. These explorations indicate that music and language should be seen as complex constellations of subprocesses, some of which are shared, and others not. In many cases, the links are not obvious at first glance. Yet they are there, and they run deeper than has generally been believed. Exploring this network of relations, with due attention to both the similarities and the differences, can improve our understanding of how the mind assembles complex communicative abilities from elementary cognitive processes.
事实上,贯穿本书的一个基本信息可以用两句话来概括:
Indeed, a fundamental message that runs through this book can be summarized in two statements:
1. 作为认知和神经系统,音乐和语言密切相关。
2. 比较音乐和语言提供了一种强有力的方法来研究大脑用来理解声音的机制。
1. As cognitive and neural systems, music and language are closely related.2. Comparing music and language provides a powerful way to study the mechanisms that the mind uses to make sense out of sound.
音乐语言研究还提出了弥合当前科学与人文学科之间鸿沟的方法。这一鸿沟两边的杰出人士都提倡将这两种人类知识框架结合在一起的研究(例如,Wilson,1998 年;Becker,2004 年;Edelman,2006 年)。音乐与语言关系的研究是科学和人文研究可以有意义地交织在一起的一个领域,在这个领域中,跨越传统界限的互动可以以任何一方都无法单独完成的新思想和发现的形式结出硕果。统一科学和人文知识的研究仍然不常见,但用诗人约翰·多恩 (John Donne) 的话说,它们解决了使我们成为人类的微妙结点。
Music-language studies also suggest ways of bridging the current divide between the sciences and the humanities. Prominent minds on both sides of this divide are advocating for studies that bring these two frameworks of human knowledge together (e.g., Wilson, 1998; Becker, 2004; Edelman, 2006). The study of music-language relations is one area in which scientific and humanistic studies can meaningfully intertwine, and in which interactions across traditional boundaries can bear fruit in the form of new ideas and discoveries that neither side can accomplish alone. Studies that unify scientific and humanistic knowledge are still uncommon, yet to paraphrase the poet John Donne, they address that subtle knot which makes us human.
Abercrombie, D. (1967)。一般语音学的要素。芝加哥:阿尔丁。
Abercrombie, D. (1967). Elements of General Phonetics. Chicago: Aldine.
亚伯拉罕,G. (1974)。西方音乐的传统。伯克利:加州大学出版社。
Abraham, G. (1974). The Tradition of Western Music. Berkeley: University of California Press.
Acker, BE、Pastore, RE 和 Hall, MD (1995)。和弦的类别内辨别:感知磁铁还是锚?感知与心理物理学, 57:863–874。
Acker, B. E., Pastore, R. E., & Hall, M. D. (1995). Within-category discrimination of musical chords: Perceptual magnet or anchor? Perception and Psychophysics, 57:863–874.
亚当斯,S. (1997)。诗歌设计:韵律、诗歌形式和修辞格简介。加拿大安大略省彼得伯勒:Broadview Press。
Adams, S. (1997). Poetic Designs: An Introduction to Meters, Verse Forms, and Figures of Speech. Peterborough, Ontario, Canada: Broadview Press.
Adolphs, R.、Damasio, H. 和 Tranel, D. (2002)。神经系统或情绪韵律的识别:3-D 病变研究。情感, 2:23-51。
Adolphs, R., Damasio, H., & Tranel, D. (2002). Neural systems or recognition of emotional prosody: A 3-D lesion study. Emotion, 2:23–51.
Agawu, K. (1991)。玩符号:古典音乐的符号学解释。新泽西州普林斯顿:普林斯顿大学出版社。
Agawu, K. (1991). Playing With Signs: A Semiotic Interpretation of Classic Music. Princeton, NJ: Princeton University Press.
Agawu, K. (1995)。非洲节奏:北方母羊的视角。英国剑桥:剑桥大学出版社。
Agawu, K. (1995). African Rhythm: A Northern Ewe Perspective. Cambridge, UK: Cambridge University Press.
Alcock, KJ、Passingham, RE、Watkins, KE 和 Vargha-Khadem, F. (2000a)。遗传性言语和语言障碍以及获得性语言障碍中的口腔运用障碍。大脑和语言, 75:17-33。
Alcock, K. J., Passingham, R. E., Watkins, K. E., & Vargha-Khadem, F. (2000a). Oral dyspraxia in inherited speech and language impairment and acquired dysphasia. Brain and Language, 75:17–33.
Alcock, KJ、Passingham, RE、Watkins, K. 和 Vargha-Khadem, F. (2000b)。遗传性言语和语言障碍的音调和计时能力。大脑和语言, 75:34–46。
Alcock, K. J., Passingham, R. E., Watkins, K., & Vargha-Khadem, F. (2000b). Pitch and timing abilities in inherited speech and language impairment. Brain and Language, 75:34–46.
艾伦,G.(1878 年)。注意耳聋。心灵, 3:157-167。
Allen, G. (1878). Note-deafness. Mind, 3:157–167.
GD 艾伦和 S. 霍金斯 (1978)。音韵的发展。收录于:A. Bell & J. Hooper(编),音节和句段(第 173-175 页)。阿姆斯特丹:北荷兰省。
Allen, G. D., & Hawkins, S. (1978). The development of phonological rhythm. In: A. Bell & J. Hooper (Eds.), Syllables and Segments (pp. 173–175). Amsterdam: North-Holland.
Allen, R.、Hill, E. 和 Heaton, P.(提交)。“具有舒缓的魅力”:一项关于患有 ASD 的高功能成年人如何体验音乐的探索性研究。
Allen, R., Hill, E., & Heaton, P. (submitted). “Hath charms to soothe”: An exploratory study of how high-functioning adults with ASD experience music.
安德森,L.(1959 年)。Ticuna 元音特别注意五个音位系统(As vogais do tikuna com especial atenção ao sistema de cinco tonemas)。Série Lingüística 特别版, 1:76–119。里约热内卢:国家博物馆。
Anderson, L. (1959). Ticuna vowels with special regard to the system of five tonemes (As vogais do tikuna com especial atenção ao sistema de cinco tonemas). Série Lingüística Especial, 1:76–119. Rio de Janeiro: Museu Nacional.
安德森,SR (1978)。语气特征。载于:V. Fromkin(主编),音调:语言调查(第 133-176 页)。纽约:学术出版社。
Anderson, S. R. (1978). Tone features. In: V. Fromkin (Ed.), Tone: A Linguistic Survey (pp. 133–176). New York: Academic Press.
Antinucci, F. (1980)。索马里语中指示助词的句法。第二部分:疑问句、否定句和否定疑问句的构成。非洲语言学研究, 11:1-37。
Antinucci, F. (1980). The syntax of indicator particles in Somali. Part two: The construction of interrogative, negative and negative-interrogative clauses. Studies in African Linguistics, 11:1–37.
Anvari, S.、Trainor, LJ、Woodside, J. 和 Levy, BA (2002)。学龄前儿童音乐技能、语音处理和早期阅读能力之间的关系。实验儿童心理学杂志, 83:111-130。
Anvari, S., Trainor, L. J., Woodside, J., & Levy, B. A. (2002). Relations among musical skills, phonological processing, and early reading ability in preschool children. Journal of Experimental Child Psychology, 83:111–130.
Arcadi, AC、Robert, D. 和 Boesch, C. (1998)。野生黑猩猩的支撑鼓声:时间模式、大声叫声中的短语整合以及个体差异性的初步证据。灵长类动物, 39:503–516。
Arcadi, A. C., Robert, D., & Boesch, C. (1998). Buttress drumming by wild chimpanzees: Temporal patterning, phrase integration into loud calls, and preliminary evidence for individual distinctiveness. Primates, 39:503–516.
Areni, CS, & Kim, D. (1993)。背景音乐对购物行为的影响:葡萄酒商店中的古典音乐与前四十名音乐。消费者研究进展, 20:336–340。
Areni, C. S., & Kim, D. (1993). The influence of background music on shopping behavior: Classical versus top-forty music in a wine store. Advances in Consumer Research, 20:336–340.
Arnold, KM, & Jusczyk, PW (2002)。语音和歌曲中的文本到曲调对齐。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Arnold, K. M., & Jusczyk, P. W. (2002). Text-to-tune alignment in speech and song. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Arom, S.、Léothaud, G.、Voisin, F. (1997)。实验民族音乐学:一种研究音阶的互动方法。载于:I. Deliége 和 J. Sloboda(编辑),音乐的感知和认知(第 3-30 页)。英国霍夫:心理学出版社。
Arom, S., Léothaud, G., Voisin, F. (1997). Experimental ethnomusicology: An interactive approach to the study of musical scales. In: I. Deliége & J. Sloboda (Eds.), Perception and Cognition of Music (pp. 3–30). Hove, UK: Psychology Press.
Arvaniti, A. (1994)。希腊节奏结构的声学特征。语音学杂志, 22:239-268。
Arvaniti, A. (1994). Acoustic features of Greek rhythmic structure. Journal of Phonetics, 22:239–268.
Arvaniti, A., & Garding, G.(出版中)。美国英语上升口音的方言变化。在 J. Hualde & Jo Cole(编辑),实验室音韵学论文 9。柏林,德国:Mouton de Gruyter。
Arvaniti, A., & Garding, G. (in press). Dialectal variation in the rising accents of American English. In J. Hualde & Jo Cole (Eds.), Papers in Laboratory Phonology 9. Berlin, Germany: Mouton de Gruyter.
Arvaniti, A.、Ladd, DR 和 Mennen, I. (1998)。音调对齐的稳定性:希腊核前口音的情况。语音学杂志, 26:3-25。
Arvaniti, A., Ladd, D. R., & Mennen, I. (1998). Stability of tonal alignment: The case of Greek prenuclear accents. Journal of Phonetics, 26:3–25.
阿什利,R. (2002)。不要为我换一根头发:爵士乐节奏的艺术。音乐感知, 19:311–332。
Ashley, R. (2002). Do[n’t] change a hair for me: The art of jazz rubato. Music Perception, 19:311–332.
Asu, EL, & Nolan, F. (2006)。爱沙尼亚语和英语节奏:基于音节和脚的二维量化。Proceedings of Speech Prosody 2006,5月 2-5 日,德国德累斯顿。
Asu, E. L., & Nolan, F. (2006). Estonian and English rhythm: A two-dimensional quantification based on syllables and feet. Proceedings of Speech Prosody 2006, May 2–5, Dresden, Germany.
Atterer, M., & Ladd, DR (2004)。论 F0 的“分段锚定”的语音学和音位学:来自德语的证据。语音学杂志, 32:177-197。
Atterer, M., & Ladd, D. R. (2004). On the phonetics and phonology of “segmental anchoring” of F0: Evidence from German. Journal of Phonetics, 32:177–197.
Au, TK-F., Knightly, LM, Jun, S.-A., & Oh, JS (2002)。小时候无意中听到一种语言。心理科学, 13:238-243。
Au, T. K.-F., Knightly, L. M., Jun, S.-A., & Oh, J. S. (2002). Overhearing a language during childhood. Psychological Science, 13:238–243.
Ayari, M., & McAdams, S. (2003)。阿拉伯即兴器乐 (taqsîm) 的听觉分析。音乐感知, 21:159–216。
Ayari, M., & McAdams, S. (2003). Aural analysis of Arabic improvised instrumental music (taqsîm). Music Perception, 21:159–216.
Ayotte, J.、Peretz, I. 和 Hyde, KL (2002)。先天性失乐症:一项针对患有特定音乐障碍的成年人的小组研究。大脑, 125:238-251。
Ayotte, J., Peretz, I., & Hyde, K. L. (2002). Congenital amusia: A group study of adults afflicted with a music-specific disorder. Brain, 125:238–251.
Ayotte, J.、Peretz, I.、Rousseau, I.、Bard, C. 和 Bojanowski, M. (2000)。与大脑中动脉梗塞相关的音乐失认症模式。大脑, 123:1926–1938。
Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music agnosia associated with middle cerebral artery infarcts. Brain, 123:1926–1938.
Baharloo, S.、Johnston, PA、Service, SK、Gitschier, J. 和 Freimer, NB (1998)。绝对音高:一种识别遗传和非遗传成分的方法。美国人类遗传学杂志, 62:224–231。
Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998). Absolute pitch: An approach for identification of genetic and nongenetic components. American Journal of Human Genetics, 62:224–231.
Baharloo, S., Service, SK, Risch, N., Gitschier, J., & Freimer, NB (2000)。绝对音高的家族聚集。美国人类遗传学杂志, 67:755–758。
Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of absolute pitch. American Journal of Human Genetics, 67:755–758.
贝克,M. (2001)。语言的原子。纽约:基础书籍。
Baker, M. (2001). The Atoms of Language. New York: Basic Books.
Balaban, E. (1988)。鸟鸣句法:学习种内变异是有意义的。美国国家科学院院刊, 85:3657–3660。
Balaban, E. (1988). Bird song syntax: Learned intraspecific variation is meaningful. Proceedings of the National Academy of Sciences, USA, 85:3657–3660.
Balaban, E. (2006)。认知发育生物学:历史、过程和命运之轮。认知, 101:298-332。
Balaban, E. (2006). Cognitive developmental biology: History, process, and fortune’s wheel. Cognition, 101:298–332.
Balaban, E.、Teillet, MA 和 Le Douarin, N. (1988)。鹌鹑-小鸡嵌合体系统在大脑发育和行为研究中的应用。科学, 241:1339-42。
Balaban, E., Teillet, M. A., & Le Douarin, N. (1988). Application of the quail-chick chimera system to the study of brain development and behavior. Science, 241:1339–42.
Balkwill, LL, & Thompson, WF (1999)。对音乐中情感感知的跨文化调查:心理生理和文化线索。音乐感知, 17:43–64。
Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music Perception, 17:43–64.
巴尔特,M. (2004)。寻找音乐的钥匙。科学, 306:1120–1122。
Balter, M. (2004). Seeking the key to music. Science, 306:1120–1122.
GJ 巴尔扎诺 (1980)。12 倍和微调音高系统的群论描述。计算机音乐杂志, 4:66–84。
Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. Computer Music Journal, 4:66–84.
Baptista, LF, & Keister, RA (2005)。为什么鸟鸣有时就像音乐。生物学和医学展望, 48:426-43。
Baptista, L. F., & Keister, R. A. (2005). Why birdsong is sometimes like music. Perspectives in Biology and Medicine, 48:426–43.
Barlow, H., & Morgenstern, S. (1983)。音乐主题词典,修订版。伦敦:Faber and Faber。
Barlow, H., & Morgenstern, S. (1983). A Dictionary of Musical Themes, Revised Edition. London: Faber and Faber.
Barnes, R., & Jones, MR (2000)。期望、注意力和时间。认知心理学, 41:254–311。
Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41:254–311.
Barrett, S. (1997)。语音感知中的原型。博士 论文,剑桥大学,英国剑桥。S. 巴雷特 (2000)。感知磁铁效应并非特定于语音原型:来自音乐类别的新证据。言语、听力和语言:进行中的工作, 11:1–16。
Barrett, S. (1997). Prototypes in Speech Perception. Ph.D. dissertation, University of Cambridge, Cambridge, UK. Barrett, S. (2000). The perceptual magnet effect is not specific to speech prototypes: New evidence from music categories. Speech, Hearing and Language: Work in Progress, 11:1–16.
Barry, WJ、Andreeva, B.、Russo, M.、Dimitrova, S. 和 Kostadinova, T. (2003)。节奏测量能告诉我们有关语言类型的任何信息吗?第 15 届国际语音科学大会论文集,巴塞罗那,第 2693–2696 页。
Barry, W. J., Andreeva, B., Russo, M., Dimitrova, S., & Kostadinova, T. (2003). Do rhythm measures tell us anything about language type? Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 2693–2696.
Baruch, C., & Drake, C. (1997)。婴儿的节奏歧视。婴儿行为与发展, 20:573–577。
Baruch, C., & Drake, C. (1997). Tempo discrimination in infants. Infant Behavior and Development, 20:573–577.
Bearth, T., & Zemp, H. (1967)。Dan(圣诞老人)的音韵。非洲语言杂志, 6:9–28。
Bearth, T., & Zemp, H. (1967). The phonology of Dan (Santa). Journal of African Languages, 6:9–28.
Becker, A., & Becker, J. (1979)。音乐流派 Srepegan 的语法。音乐理论杂志, 23:1-43。
Becker, A., & Becker, J. (1979). A grammar of the musical genre Srepegan. Journal of Music Theory, 23:1–43.
Becker, A., & Becker, J. (1983)。对 Srepegan 的反思:以对话的形式重新考虑。亚洲音乐, 14:9–16。
Becker, A., & Becker, J. (1983). Reflection on Srepegan: A reconsideration in the form of a dialogue. Asian Music, 14:9–16.
贝克尔,J.(1979 年)。在 Java 中计时和调整。载于:AL Becker 和 AA Yengoyan(编辑),现实的想象:东南亚连贯系统论文(第 197-210 页)。新泽西州诺伍德:Ablex。
Becker, J. (1979). Time and tune in Java. In: A. L. Becker & A. A. Yengoyan (Eds.), The Imagination of Reality: Essays in Southeast Asian Coherence Systems (pp. 197–210). Norwood, NJ: Ablex.
贝克尔,J.(1986 年)。西方艺术音乐高人一等?音乐季刊, 72:341–359。
Becker, J. (1986). Is Western art music superior? Musical Quarterly, 72:341–359.
贝克尔,J.(2001 年)。关于音乐和情感的人类学观点。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 135-160 页)。英国牛津:牛津大学出版社。
Becker, J. (2001). Anthropological perspectives on music and emotion. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 135–160). Oxford, UK: Oxford University Press.
贝克尔,J.(2004 年)。深度聆听者:音乐、情感和恍惚。布卢明顿:印第安纳大学出版社。
Becker, J. (2004). Deep Listeners: Music, Emotion, and Trancing. Bloomington: Indiana University Press.
Becker, J., & Becker, A. (1981)。音乐偶像:爪哇加麦兰音乐中的力量和意义。载于:W. Steiner(主编),《音乐与文学的标志》(第 203-215 页)。奥斯汀:德克萨斯大学出版社。
Becker, J., & Becker, A. (1981). A musical icon: Power and meaning in Javanese gamelan music. In: W. Steiner (Ed.), The Sign in Music and Literature (pp. 203–215). Austin: University of Texas Press.
贝克曼,M. (1982)。片段持续时间和日语中的“mora”。语音学, 39:113-135。
Beckman, M. (1982). Segment duration and the “mora” in Japanese. Phonetica, 39:113–135.
贝克曼,M. (1992)。跨语言语音节奏的证据。在:Y. Tohkura 等人。(编辑),语音感知、生产和语言结构(第 457–463 页)。东京:IOS 出版社。
Beckman, M. (1992). Evidence for speech rhythm across languages. In: Y. Tohkura et al. (Eds), Speech Perception, Production, and Linguistic Structure (pp. 457–463). Tokyo: IOS Press.
Beckman, ME, & Edwards, J., & Fletcher, J. (1992)。发音动力学响度模型中的韵律结构和节奏。在:GJ Docherty & DR Ladd(编辑),实验室音韵学论文 II:音段、手势、韵律(第 68-86 页)。英国剑桥:剑桥大学出版社。
Beckman, M. E., & Edwards, J., & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In: G. J. Docherty & D. R. Ladd (Eds.), Papers in Laboratory Phonology II: Segment, Gesture, Prosody (pp. 68–86). Cambridge, UK: Cambridge University Press.
Beckman, ME、Hirschberg, J. 和 Shattuck-Hufnagel, S. (2005)。原始的 ToBI 系统和 ToBI 框架的演变。在:S. Jun(主编),韵律类型学:语调和短语的音系学(第 9-54 页)。英国牛津:牛津大学出版社。
Beckman, M. E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In: S. Jun (Ed.), Prosodic Typology: The Phonology of Intonation and Phrasing (pp. 9–54). Oxford, UK: Oxford University Press.
Beckman, ME, & Pierrehumbert, JB (1986)。日语和英语的语调结构。音系年鉴, 3:255–309。
Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology Yearbook, 3:255–309.
M. 比曼 (1993)。右半球的语义处理可能有助于在理解过程中进行推理。大脑和语言, 44:80–120。
Beeman, M. (1993). Semantic processing in the right hemisphere may contribute to drawing inferences during comprehension. Brain and Language, 44:80–120.
Beeman, M. (1998)。粗语义编码和语篇理解。载于:M. Beeman & C. Chiarello(编),正确理解:右半球语言理解的认知神经科学(第 225-284 页)。新泽西州 Mahwah:Erlbaum。
Beeman, M. (1998). Coarse semantic coding and discourse comprehension. In: M. Beeman & C. Chiarello (Eds.), Getting It Right: The Cognitive Neuroscience of Right Hemisphere Language Comprehension (pp. 225–284). Mahwah, NJ: Erlbaum.
Belin, P.、Zatorre, RJ、Lafaille, P.、Ahad, P. 和 Pike, B. (2000)。人类听觉皮层中的语音选择区域。自然, 403:309-12。
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403:309–12.
Belleville, S.、Caza, N. 和 Peretz, I. (2003)。记忆处理观点的神经心理学论证。记忆与语言杂志, 48:686–703。
Belleville, S., Caza, N., & Peretz, I. (2003). A neuropsychological argument for a processing view of memory. Journal of Memory and Language, 48:686–703.
Bellugi, U., & St. George, M.(编辑)。(2001)。从认知到大脑再到基因的旅程:威廉姆斯综合症的观点。马萨诸塞州剑桥市:麻省理工学院出版社。
Bellugi, U., & St. George, M. (Eds.). (2001). Journey From Cognition to Brain to Gene: Perspectives From Williams Syndrome. Cambridge, MA: MIT Press.
Benamou, M. (2003)。比较音乐影响:爪哇和西方。音乐世界, 45:57-76。
Benamou, M. (2003). Comparing musical affect: Java and the West. The World of Music, 45:57–76.
Bengtsson, SL、Nagy, Z.、Skare, S.、Forsman, L.、Forssberg, H. 和 Ullén, F. (2005)。广泛的钢琴练习对白质发育具有区域特异性影响。自然神经科学, 8:1148–1150。
Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8:1148–1150.
Bent, T.、Bradlow, AR 和 Wright, BA (2006)。语言经验对语音和非语音声音中音调感知的影响。实验心理学杂志:人类感知和表现, 32:97–103。
Bent, T., Bradlow, A. R., & Wright, B. A. (2006). The influence of linguistic experience on pitch perception in speech and non-speech sounds. Journal of Experimental Psychology: Human Perception and Performance, 32:97–103.
Bergeson, TR, & Trehub, SE (2002)。母亲给婴儿唱的歌曲中的绝对音高和节奏。心理科学, 13:72–75。
Bergeson, T. R., & Trehub, S. E. (2002). Absolute pitch and tempo in mothers’ songs to infants. Psychological Science, 13:72–75.
Bergeson, TR, & Trehub, SE (2006)。婴儿对节奏模式的感知。音乐感知, 23:345-360。
Bergeson, T. R., & Trehub, S. E. (2006). Infants’ perception of rhythmic patterns. Music Perception, 23: 345–360.
Berinstein, A. (1979)。关于压力感知的跨语言研究。加州大学洛杉矶分校语音学工作论文, 47:1-59。
Berinstein, A. (1979). A cross-linguistic study on the perception of stress. UCLA Working Papers in Phonetics, 47:1–59.
L. 伯恩斯坦 (1959)。音乐的乐趣。纽约:西蒙与舒斯特。L. 伯恩斯坦 (1976)。未回答的问题。马萨诸塞州剑桥市:哈佛大学出版社。
Bernstein, L. (1959). The Joy of Music. New York: Simon & Schuster. Bernstein, L. (1976). The Unanswered Question. Cambridge, MA: Harvard University Press.
Bertinetto, P. (1989)。关于二分法“压力”与“音节时间”的思考。Revue de Phonétique Appliquee, 91–93:99–130。
Bertinetto, P. (1989). Reflections on the dichotomy “stress” vs. “syllable-timing.” Revue de Phonétique Appliquee, 91–93:99–130.
Besson, M., & Faïta, F. (1995)。音乐预期的事件相关电位 (ERP) 研究:音乐家与非音乐家的比较。实验心理学杂志:人类感知和表现, 21:1278–1296。
Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical expectancy: Comparison of musicians with nonmusicians. Journal of Experimental Psychology: Human Perception and Performance, 21:1278–1296.
Besson, M.、Faïta, F.、Peretz, I.、Bonnel, A.-M. 和 Requin, J. (1998)。脑中歌唱:词曲独立。心理科学, 9:494–498。
Besson, M., Faïta, F., Peretz, I., Bonnel, A.-M., & Requin, J. (1998). Singing in the brain: Independence of lyrics and tunes. Psychological Science, 9:494–498.
Besson, M., & Macar, F. (1987)。对音乐和其他非语言环境中不协调的事件相关潜在分析。心理生理学, 24:14–25。
Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in music and other non-linguistic contexts. Psychophysiology, 24:14–25.
贝斯特,康涅狄格州 (1994)。学习感知英语的发音模式。载于:C. Rovee-Collier 和 L. Lipsitt(编辑),婴儿研究进展(第 9 卷,第 217-304 页)。新泽西州诺伍德:Ablex。
Best, C. T. (1994). Learning to perceive the sound pattern of English. In: C. Rovee-Collier & L. Lipsitt (Eds.), Advances in Infancy Research (Vol. 9, pp. 217–304). Norwood, NJ: Ablex.
Best, CT, & Avery, RA (1999)。点击辅音的左半球优势由语言意义和经验决定。心理科学, 10:65–70。
Best, C. T., & Avery, R. A. (1999). Left-hemisphere advantage for click consonants is determined by linguistic significance and experience. Psychological Science, 10:65–70.
Best, CT, McRoberts, GW, & Goodell, E. (2001)。非母语辅音的辨别与听者母语语音系统的感知同化不同。美国声学学会杂志, 109:775–794。
Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. Journal of the Acoustical Society of America, 109:775–794.
Best, CT, McRoberts, GW, & Sithole, NM (1988)。非母语语音对比的感知重组检查:讲英语的成人和婴儿的祖鲁语点击歧视。实验心理学杂志:人类感知和表现, 14:345-360。
Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance, 14:345–360.
JJ 巴鲁查 (1984a)。事件层次结构、音调层次结构和同化:对多伊奇和道林的答复。实验心理学杂志,综合, 113:421-5。
Bharucha, J. J. (1984a). Event hierarchies, tonal hierarchies and assimilation: A reply to Deutsch and Dowling. Journal of Experimental Psychology, General, 113:421–5.
JJ 巴鲁查 (1984b)。音乐中的锚定效应——不和谐的解决。认知心理学, 16:485–518。
Bharucha, J. J. (1984b). Anchoring effects in music—the resolution of dissonance. Cognitive Psychology, 16:485–518.
JJ 巴鲁查 (1987)。音乐认知和知觉促进。音乐感知, 5:1–30。
Bharucha, J. J. (1987). Music cognition and perceptual facilitation. Music Perception, 5:1–30.
Bharucha, JJ, & Stoeckig, K. (1986)。反应时间和音乐预期。实验心理学杂志:人类感知和表现, 12:403-410。
Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy. Journal of Experimental Psychology: Human Perception and Performance, 12:403–410.
Bharucha, JJ, & Stoeckig, K. (1987)。和弦启动:传播激活或重叠频谱?知觉与心理物理学, 41:519–524。
Bharucha, J. J., & Stoeckig, K. (1987). Priming of chords: Spreading activation or overlapping frequency spectra? Perception and Psychophysics, 41:519–524.
D. 比克顿 (1984)。语言生物程序假说。行为与脑科学, 7:173–188。
Bickerton, D. (1984). The language bioprogram hypothesis. Behavioral and Brain Sciences, 7:173–188.
Bigand, E. (1997)。感知音乐稳定性:音调结构、节奏和音乐专业知识的影响。实验心理学杂志:人类感知和表现, 23, 808-812。
Bigand, E. (1997). Perceiving musical stability: The effect of tonal structure, rhythm and musical expertise. Journal of Experimental Psychology: Human Perception and Performance, 23, 808–812.
Bigand, E. (2003)。更多关于未经音乐训练的听众的音乐专业知识。纽约科学院年鉴, 999:304–312。
Bigand, E. (2003). More about the musical expertise of musically untrained listeners. Annals of the New York Academy of Sciences, 999:304–312.
Bigand, E.、Madurell, F.、Tillmann, B. 和 Pineau, M. (1999)。全局结构和时间组织对和弦处理的影响。实验心理学杂志:人类感知和表现, 25:184-197。
Bigand, E., Madurell, F., Tillmann, B., & Pineau, M. (1999). Effect of global structure and temporal organization on chord processing. Journal of Experimental Psychology: Human Perception and Performance, 25:184–197.
Bigand, E., & Parncutt, R. (1999)。感知长和弦序列中的音乐张力。心理学研究, 62:237-254。
Bigand, E., & Parncutt, R. (1999). Perceiving musical tension in long chord sequences. Psychological Research, 62:237–254.
Bigand, E., & Pineau, M. (1997)。上下文对音乐期望的影响。知觉与心理物理学, 59:1098–1107。
Bigand, E., & Pineau, M. (1997). Context effects on musical expectancy. Perception and Psychophysics, 59:1098–1107.
Bigand, E.、Poulin, B.、Tillmann, B.、Madurell, F. 和 D'Adamo, DA (2003)。谐波启动中的感觉与认知成分。实验心理学杂志:人类感知和表现, 29:159-171。
Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D’Adamo, D. A. (2003). Sensory versus cognitive components in harmonic priming. Journal of Experimental Psychology: Human Perception and Performance, 29:159–171.
Bigand, E., & Poulin-Charronnat, B. (2006)。我们是“有经验的听众”吗?回顾不依赖于正规音乐训练的音乐能力。认知, 100:100–130。
Bigand, E., & Poulin-Charronnat, B. (2006). Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training. Cognition, 100:100–130.
Bigand, E.、Tillmann, B.、Poulin, B.、D'Adamo, D. 和 Madurell, F. (2001)。谐波语境对声乐中音素监测的影响。认知, 81:B11–B20。
Bigand, E., Tillmann, B., Poulin, B., D’Adamo, D., & Madurell, F. (2001). The effect of harmonic context on phoneme monitoring in vocal music. Cognition, 81:B11–B20.
Bigand, E., Tillmann, B., & Poulin-Charronnat, B. (2006)。音乐句法处理模块?认知科学趋势, 10:195-196。
Bigand, E., Tillmann, B., & Poulin-Charronnat, B. (2006). A module for syntactic processing in music? Trends in Cognitive Sciences, 10:195–196.
Bigand, E.、Vieillard, S.、Madurell, F.、Marozeau, J. 和 Dacquet, A.(2005 年)。对音乐的情绪反应的多维尺度:音乐专业知识和摘录持续时间的影响。认知与情感, 19:1113–1139。
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition and Emotion, 19:1113–1139.
布莱克本,P. (1997)。哈里·帕奇。明尼苏达州圣保罗:美国作曲家论坛。
Blackburn, P. (1997). Harry Partch. St. Paul, MN: American Composer’s Forum.
Blacking, J. (1967)。Venda 儿童歌曲:民族音乐学分析研究。南非约翰内斯堡:金山大学出版社。
Blacking, J. (1967). Venda Children’s Songs: A Study in Ethnomusicological Analysis. Johannesburg, South Africa: Witwatersrand University Press.
Blood, AJ, & Zatorre, RJ (2001)。对音乐的强烈愉悦反应与大脑中与奖赏和情绪有关的区域的活动相关。美国国家科学院院刊, 98:11818–11823。
Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated with reward and emotion. Proceedings of the National Academy of Sciences, USA, 98:11818–11823.
Blood, AJ, Zatorre, RJ, Bermudez, P., & Evans, AC (1999)。对愉快和不愉快的音乐的情绪反应与旁边缘大脑区域的活动相关。自然神经科学, 2:382–387。
Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2:382–387.
Boemio, A.、Fromm, S.、Braun, A. 和 Poeppel, D. (2005)。人类听觉皮层的分层和不对称时间敏感性。自然神经科学, 8:389–395。
Boemio, A., Fromm, S., Braun, A., & Poeppel, D. (2005). Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nature Neuroscience, 8:389–395.
Bolinger, D. (1958)。英语音调重音理论。字, 14:109-149。
Bolinger, D. (1958). A theory of pitch accent in English. Word, 14:109–149.
Bolinger, D. (1981)。两种元音,两种节奏。印第安纳州布卢明顿:印第安纳大学语言学俱乐部。
Bolinger, D. (1981). Two Kinds of Vowels, Two Kinds of Rhythm. Bloomington, IN: Indiana University Linguistics Club.
Bolinger, D. (1985)。语调及其组成部分:英语口语中的旋律。伦敦:爱德华阿诺德。
Bolinger, D. (1985). Intonation and Its Parts: Melody in Spoken English. London: Edward Arnold.
T. 博尔顿 (1894)。韵律。美国心理学杂志, 6:145–238。玻尔兹,MG (1989)。感知结束:音调关系对旋律完成的影响。实验心理学杂志:人类感知和表现, 15:749–761。
Bolton, T. (1894). Rhythm. American Journal of Psychology, 6:145–238. Boltz, M. G. (1989). Perceiving the end: Effects of tonal relationships on melodic completion. Journal of Experimental Psychology: Human Perception and Performance, 15:749–761.
玻尔兹,MG (1991)。旋律回忆的一些结构决定因素。记忆与认知, 19:239–251。
Boltz, M. G. (1991). Some structural determinants of melody recall. Memory and Cognition, 19:239–251.
Boltz, MG, & Jones, MR (1986)。规则递归是否使旋律更容易重现?如果不是,那是什么?认知心理学, 18:389–431。
Boltz, M. G., & Jones, M. R. (1986). Does rule recursion make melodies easier to reproduce? If not, what does? Cognitive Psychology, 18:389–431.
Bonnel, A.-M.、Faïta, F.、Peretz, I. 和 Besson, M. (2001)。歌剧歌曲的歌词和曲调之间的注意力分散:独立处理的证据。知觉和心理物理学, 63:1201-1213。
Bonnel, A.-M., Faïta, F., Peretz, I., & Besson, M. (2001). Divided attention between lyrics and tunes of operatic songs: Evidence for independent processing. Perception andPsychophysics, 63:1201–1213.
Booth, JR, Burman, DD, Van Santen, FW, Harasaki, Y., Gitelman, DR, Parrish, TB 等。(2001)。阅读口语时专门的大脑系统的发展。神经心理学发展和认知。C 节,儿童神经心理学, 7:119–141。
Booth, J. R., Burman, D. D., Van Santen, F. W., Harasaki, Y., Gitelman, D. R., Parrish, T. B., et al. (2001). The development of specialized brain systems in reading an oral language. Neuropsychology Development, and Cognition. Section C, Child Neuropsychology, 7:119–141.
Bor, J.(主编)。(1999)。拉格指南:74 个印度斯坦拉格的调查。英国:Nimbus Records/鹿特丹音乐学院。(NI 5536/9)。
Bor, J. (Ed.). (1999). The Raga Guide: A Survey of 74 Hindustani Ragas. UK: Nimbus Records/Rotterdam Conservatory of Music. (NI 5536/9).
Bradlow, AR、Nygaard, LC 和 Pisoni, DB (1999)。说话者、速率和振幅变化对口语识别记忆的影响。知觉与心理物理学, 61:206–219。
Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Perception and Psychophysics, 61:206–219.
Bramble, DM, & Lieberman, DE (2004)。耐力跑和人的进化。自然, 432:345–352。
Bramble, D. M., & Lieberman, D. E. (2004). Endurance running and the evolution of Homo. Nature, 432:345–352.
Brattico, E.、Näätänen, R. 和 Tervaniemi, M. (2001)。上下文对音乐家和非音乐家音高感知的影响:来自与事件相关的潜在录音的证据。音乐感知, 19:199–222。
Brattico, E., Näätänen, R., & Tervaniemi, M. (2001). Context effects on pitch perception in musicians and nonmusicians: Evidence from event-related potential recordings. Music Perception, 19:199–222.
A. 布雷格曼 (1990)。听觉场景分析:声音的感知组织。马萨诸塞州剑桥市:麻省理工学院出版社。
Bregman, A. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press.
Bretos, J., & Sundberg, J. (2003)。十位女高音演唱的长持续渐强音符中颤音参数的测量。声音杂志, 17:343-52。
Bretos, J., & Sundberg, J. (2003). Measurements of vibrato parameters in long sustained crescendo notes as sung by ten sopranos. Journal of Voice, 17:343–52.
Brinner, B. (1995)。了解音乐,制作音乐:爪哇甘美兰和音乐能力与互动理论。芝加哥:芝加哥大学出版社。
Brinner, B. (1995). Knowing Music, Making Music: Javanese Gamelan and the Theory of Musical Competence and Interaction. Chicago: University of Chicago Press.
Brosch, M.、Selezneva, E.、Bucks, C. 和 Scheich, H. (2004)。猕猴区分音高关系。认知, 91:259-272。
Brosch, M., Selezneva, E., Bucks, C., & Scheich, H. (2004). Macaque monkeys discriminate pitch relationships. Cognition, 91:259–272.
Brown, S.、Martinez, MJ 和 Parsons, LM (2006)。大脑中的音乐和语言并存:旋律和句子生成的 PET 研究。欧洲神经科学杂志, 23:2791–2803。
Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A PET study of the generation of melodies and sentences. European Journal of Neuroscience, 23:2791–2803.
Brown, WA、Cammuso, K.、Sachs, H.、Winklosky, B.、Mullane, J.、Bernier, R. 等。(2003)。具有绝对音高的人与自闭症相关的语言、性格和认知:初步研究的结果。自闭症和发育障碍杂志, 33:163-167。
Brown, W. A., Cammuso, K., Sachs, H., Winklosky, B., Mullane, J., Bernier, R., et al. (2003). Autism-related language, personality, and cognition in people with absolute pitch: Results of a preliminary study. Journal of Autism and Developmental Disorders, 33:163–167.
Brownell, HH、Potter, HH、Bihrle, AM 和 Gardner, H. (1986)。右脑受损患者的推理缺陷。大脑和语言, 29:310–321。
Brownell, H. H., Potter, H. H., Bihrle, A. M., & Gardner, H. (1986). Inference deficits in right brain-damaged patients. Brain and Language, 29:310–321.
Bruce, G. (1977),句子透视中的瑞典语单词重音。瑞典隆德:Gleerup。
Bruce, G. (1977), Swedish Word Accents in Sentence Perspective. Lund, Sweden: Gleerup.
布鲁斯,G. (1981)。音调和时间相互作用。载于:T. Fretheim(主编),北欧韵律 II:研讨会论文(第 63-74 页)。瑞典隆德:貘。
Bruce, G. (1981). Tonal and temporal interplay. In: T. Fretheim (Ed.), Nordic Prosody II: Papers From a Symposium (pp. 63–74). Lund, Sweden: Tapir.
Buchanan, TW、Lutz, K.、Mirzazade, S.、Specht, K.、Shah, NJ、Zilles, K. 和 Jancke, L. (2000)。识别口语的情感韵律和语言成分:一项功能磁共振成像研究。认知脑研究, 9:227–238。
Buchanan, T. W., Lutz, K., Mirzazade, S., Specht, K., Shah, N. J., Zilles, K., & Jancke, L. (2000). Recognition of emotional prosody and verbal components of spoken language: An fMRI study. Cognitive Brain Research, 9:227–238.
Buck, J. (1988)。萤火虫同步有节奏地闪烁。二。生物学季刊, 63:265–289。
Buck, J. (1988). Synchronous rhythmic flashing in fireflies. II. Quarterly Review of Biology, 63:265–289.
Buhusi, CV, & Meck, WH (2005)。是什么让我们打勾?间隔计时的功能和神经机制。自然评论,神经科学, 6:755–765。
Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews, Neuroscience, 6:755–765.
Buonomano, DV, & Merzenich, MM (1998)。皮质可塑性:从突触到地图。神经科学年度回顾, 21:149–186。
Buonomano, D. V., & Merzenich, M. M. (1998). Cortical plasticity: From synapses to maps. Annual Review of Neuroscience, 21:149–186.
Burnham, D.、Peretz, I.、Stevens, K.、Jones, C.、Schwanhäusser, B.、Tsukada, K. 和 Bollwerk, S.(2004 年 8 月)。声调语言说话者有完美的音调吗?在:SD Lipscomb 等人。(编辑),第 8 届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿(第 350 页)。澳大利亚阿德莱德:Causal Productions。
Burnham, D., Peretz, I., Stevens, K., Jones, C., Schwanhäusser, B., Tsukada, K., & Bollwerk, S. (2004, August). Do tone language speakers have perfect pitch? In: S. D. Lipscomb et al. (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL (p. 350). Adelaide, Australia: Causal Productions.
伯恩斯,EM(1999)。音程、音阶和调音。载于:D. Deutsch(主编),音乐心理学,(第 2 版,第 215-264 页)。加利福尼亚州圣地亚哥:学术出版社。
Burns, E. M. (1999). Intervals, scales, and tuning. In: D. Deutsch (Ed.), The Psychology of Music, (2nd ed., pp. 215–264). San Diego, CA: Academic Press.
Burns, EM, & Ward, WD (1978)。分类感知现象或附带现象:来自旋律音程感知的证据。美国声学学会杂志, 63:456–468。
Burns, E. M., & Ward, W. D. (1978). Categorical perception—phenomenon or epiphe-nomenon: Evidence from the perception of melodic musical intervals. Journal of the Acoustical Society of America, 63:456–468.
Busnel, RG, & Classe, A. (1976)。口哨语言。柏林:施普林格出版社。
Busnel, R. G., & Classe, A. (1976). Whistled Languages. Berlin: Springer-Verlag.
Caclin, A.、McAdams, S.、Smith, BK 和 Winsberg, S. (2005)。音色空间维度的声学相关性:使用合成音调的验证性研究。美国声学学会杂志, 118:471-482。
Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the Acoustic Society of America, 118: 471–482.
WN 坎贝尔 (1993)。自动检测语音中的韵律边界。语音通信, 13:343-354。
Campbell, W. N. (1993). Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354.
Caplan, D. (1992)。语言:结构、处理和障碍。马萨诸塞州剑桥市:麻省理工学院出版社。
Caplan, D. (1992). Language: Structure, Processing, and Disorders. Cambridge, MA: MIT Press.
Caplan, D.、Hildebrandt, N. 和 Makris, N. (1996)。句子理解句法处理缺陷的中风患者的病变位置。大脑, 119:933-949。
Caplan, D., Hildebrandt, N., & Makris, N. (1996). Location of lesions in stroke patients with deficits in syntactic processing in sentence comprehension. Brain, 119:933–949.
Caplan, D., & Waters, GS (1999)。口头工作记忆和句子理解。行为与脑科学, 22, 77–94。
Caplan, D., & Waters, G. S. (1999). Verbal working memory and sentence comprehension. Behavioral and Brain Sciences, 22, 77–94.
Cariani, P. (2004)。音高多样性和音调和谐的时间模型。在:SD Lipscomb 等人。(编辑),第 8 届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿,2004 年(第 310-314 页)。澳大利亚阿德莱德:Causal Productions。
Cariani, P. (2004). A temporal model for pitch multiplicity and tonal consonance. In: S. D. Lipscomb et al. (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL, 2004 (pp. 310–314). Adelaide, Australia: Causal Productions.
JC 卡尔森 (1981)。影响旋律期望的一些因素。心理音乐学, 1:12–29。
Carlsen, J. C. (1981). Some factors which influence melodic expectancy. Psychomusicology, 1:12–29.
Carreiras, M.、Lopez, J.、Rivero, F. 和 Corina, D. (2005)。口哨语言的神经处理。自然, 433:31-32。
Carreiras, M., Lopez, J., Rivero, F., & Corina, D. (2005). Neural processing of a whistled language. Nature, 433:31–32.
JF 卡林顿 (1949a)。一些中非贡语的比较研究。比利时布鲁塞尔:Falk G. van Campenhout。
Carrington, J. F. (1949a). A Comparative Study of Some Central African Gong-Languages. Brussels, Belgium: Falk G. van Campenhout.
JF 卡林顿 (1949b)。会说话的非洲鼓。伦敦:凯里金斯盖特出版社。
Carrington, J. F. (1949b). Talking Drums of Africa. London: The Carey Kingsgate Press.
JF 卡林顿 (1971)。非洲会说话的鼓。科学美国人, 225:90-94。
Carrington, J. F. (1971). The talking drums of Africa. Scientific American, 225:90–94.
Carroll, SB (2003)。遗传学和智人的形成。自然, 422:849-857。
Carroll, S. B. (2003). Genetics and the making of Homo sapiens. Nature, 422:849–857.
Casasanto, D.(出版中)。思考的空间。载于:V. Evans & P. Chilton(编辑),语言、认知和空间:最新技术和新方向。伦敦:春分。
Casasanto, D. (in press). Space for thinking. In: V. Evans & P. Chilton (Eds.), Language, Cognition, and Space: State of the Art and New Directions. London: Equinox.
Castellano, MA、Bharucha, JJ 和 Krumhansl, CL (1984)。北印度音乐中的音调等级。实验心理学杂志:综合, 113:394-412。
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of north India. Journal of Experimental Psychology: General, 113:394–412.
Catchpole, CK, & Slater, PJB (1995)。鸟歌:生物主题和变奏曲。
Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song: Biological Themes and Variations.
英国剑桥:剑桥大学出版社。Chafe, W. (1994)。话语、意识和时间。芝加哥:芝加哥大学出版社。
Cambridge, UK: Cambridge University Press. Chafe, W. (1994). Discourse, Consciousness, and Time. Chicago: University of Chicago Press.
钱多拉,A.(1988 年)。音乐作为演讲:印度的民族音乐语言学研究。印度新德里:Narvang。
Chandola, A. (1988). Music as Speech: An Ethnomusicolinguistic Study of India. New Delhi, India: Narvang.
Chang, EF, & Merzenich, MM (2003)。环境噪音会阻碍听觉皮层的发育。科学, 300:498–502。
Chang, E. F., & Merzenich, M. M. (2003). Environmental noise retards auditory cortical development. Science, 300:498–502.
Charbonneau, S.、Scherzer, BP、Aspirot, D. 和 Cohen, H. (2002)。慢性 CVA 患者对面部和韵律情绪的感知和产生。神经心理学, 41:605–613。
Charbonneau, S., Scherzer, B. P., Aspirot, D., & Cohen, H. (2002). Perception and production of facial and prosodic emotions by chronic CVA patients. Neuropsychologia, 41:605–613.
Chartrand, J.-P., & Belin, P. (2006)。音乐家的高级语音音色处理。神经科学快报, 405:154–167。
Chartrand, J.-P., & Belin, P. (2006). Superior voice timbre processing in musicians. Neuroscience Letters, 405:154–167.
Chela-Flores, B. (1994)。关于英语节奏的习得:理论与实践问题。国际应用语言学评论, 32:232-242。
Chela-Flores, B. (1994). On the acquisition of English rhythm: Theoretical and practical issue. International Review of Applied Linguistics, 32:232–242.
Cheney, DL, & Seyfarth, RM (1982)。Vervet alarm calls:自由放养的灵长类动物的语义交流。动物行为, 28:1070–1266。
Cheney, D. L., & Seyfarth, R. M. (1982). Vervet alarm calls: Semantic communication in free ranging primates. Animal Behaviour, 28:1070–1266.
Chenoweth, V. (1980)。旋律感知和分析:民族旋律手册。Ukarumpa,巴布亚新几内亚:夏季语言学研究所。
Chenoweth, V. (1980). Melodic Perception and Analysis: A Manual on Ethnic Melody. Ukarumpa, Papua New Guinea: Summer Institute of Linguistics.
Cheour, M.、Ceponiene, R.、Lehtokoski, A.、Luuk, A.、Allik, J.、Alho, K. 和 Näätänen, R. (1998)。婴儿大脑中特定语言音素表征的发展。自然神经科学, 1:351–353。
Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., & Näätänen, R. (1998). Development of language-specific phoneme representations in the infant brain. Nature Neuroscience, 1:351–353.
Chin, CS (2003)。绝对音高的早期发展:关于音乐训练在早期发展和个人认知风格中的作用的理论。音乐心理学, 31:155-171。
Chin, C. S. (2003). The early development of absolute pitch: A theory concerning the roles of music training at an early developmental age and individual cognitive style. Psychology of Music, 31:155–171.
Cho, T., & Keating, P. (2001)。韩语领域初始强化的发音和声学研究。语音学杂志, 29:155-190。
Cho, T., & Keating, P. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29:155–190.
乔姆斯基,N. (1965)。句法理论的各个方面。马萨诸塞州剑桥市:麻省理工学院出版社。
Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.
乔姆斯基,N. (1972)。语言和思想。纽约:Harcourt Brace Jovanovich。Chomsky, N., & Halle, M. (1968)。英语的语音模式。纽约:Harper & Row。
Chomsky, N. (1972). Language and Mind. New York: Harcourt Brace Jovanovich. Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York: Harper & Row.
Christiansen, MH, & Kirby, S.(编辑)。(2003a)。语言进化。英国牛津:牛津大学出版社。
Christiansen, M. H., & Kirby, S. (Eds.). (2003a). Language Evolution. Oxford, UK: Oxford University Press.
Christiansen, MH, & Kirby, S. (2003b)。语言进化:共识与争议。认知科学趋势, 7:300–307。
Christiansen, M. H., & Kirby, S. (2003b). Language evolution: Consensus and controversies. Trends in Cognitive Sciences, 7:300–307.
克拉克,A.(2003 年)。天生的电子人。英国牛津:牛津大学出版社。
Clark, A. (2003). Natural Born Cyborgs. Oxford, UK: Oxford University Press.
Clark, S., & Rehding, A. (2001)。介绍。载于:S. Clark & A. Rehding(编),从文艺复兴到二十世纪初的音乐理论和自然秩序(第 1-13 页)。英国剑桥:剑桥大学出版社。
Clark, S., & Rehding, A. (2001). Introduction. In: S. Clark & A. Rehding (Eds.), Music Theory and Natural Order From the Renaissance to the Early Twentieth Century (pp. 1–13). Cambridge, UK: Cambridge University Press.
克拉克,E. (2001)。音乐中运动的意义和规范。科学音乐, 5:213–234。
Clarke, E. (2001). Meaning and specification of motion in music. Musicae Scientiae, 5:213–234.
克拉克,E. (2005)。聆听方式:感知音乐意义的生态学方法。英国牛津:牛津大学出版社。
Clarke, E. (2005). Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford, UK: Oxford University Press.
英孚克拉克 (1987)。分类节奏感知:生态学观点。载于:A. Gabrielsson(主编),节奏和音乐中的动作和感知(第 19-34 页)。斯德哥尔摩:瑞典皇家音乐学院。
Clarke, E. F. (1987). Categorical rhythm perception: An ecological perspective. In: A. Gabrielsson (Ed.), Action and Perception in Rhythm and Music (pp. 19–34). Stockholm: Royal Swedish Academy of Music.
英孚克拉克 (1993)。模仿和评估真实和变形的音乐表演。音乐感知, 10:317–343。
Clarke, E. F. (1993). Imitating and evaluating real and transformed musical performances. Music Perception, 10:317–343.
Clarke, EF, & Krumhansl, CL (1990)。感知音乐时间。音乐感知, 7:213–251。
Clarke, E. F., & Krumhansl, C. L. (1990). Perceiving musical time. Music Perception, 7:213–251.
Cloarec-Heiss, F. (1999)。从自然语言到鼓语言:Banda-Linda(中非共和国)的经济编码程序。载于:C. Fuchs & S. Robert(编辑),语言多样性和认知表征(第 145-157 页)。阿姆斯特丹:约翰·本杰明斯。
Cloarec-Heiss, F. (1999). From natural language to drum language: An economical encoding procedure in Banda-Linda (Central African Republic). In: C. Fuchs & S. Robert (Eds.), Language Diversity and Cognitive Representations (pp. 145–157). Amsterdam: John Benjamins.
Clough, J.、Douthett, J.、Ramanathan, N. 和 Rowell。L. (1993)。早期的印度七音阶和最近的全音阶理论。音乐理论谱, 15:36–58。
Clough, J., Douthett, J., Ramanathan, N., & Rowell. L. (1993). Early Indian heptatonic scales and recent diatonic theory. Music Theory Spectrum, 15:36–58.
M. 克莱恩斯 (1977)。Sentics:情感的触动。纽约花园城:主播出版社。
Clynes, M. (1977). Sentics: The Touch of Emotions. Garden City, New York: The Anchor Press.
科根,R. (1984)。音乐声音的新形象。马萨诸塞州剑桥市:哈佛大学出版社。
Cogan, R. (1984). New Images of Musical Sound. Cambridge, MA: Harvard University Press.
Cogan, R., & Escot, P. (1976)。声音设计:声音和音乐的本质。新泽西州恩格尔伍德悬崖:Prentice-Hall。
Cogan, R., & Escot, P. (1976). Sonic Design: The Nature of Sound and Music. Englewood Cliffs, NJ: Prentice-Hall.
科恩,AJ (2000)。音调感应的发展:可塑性、曝光和训练。音乐感知, 17:437–459。
Cohen, A. J. (2000). Development of tonality induction: Plasticity, exposure, and training. Music Perception, 17:437–459.
科恩,AJ (2001)。音乐作为电影中情感的源泉。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 249-272 页)。英国牛津:牛津大学出版社。
Cohen, A. J. (2001). Music as a source of emotion in film. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 249–272). Oxford, UK: Oxford University Press.
Cohen, AJ、Thorpe, LA 和 Trehub, SE (1987)。婴儿对短转调序列中音乐关系的感知。加拿大心理学杂志, 41:33–47。
Cohen, A. J., Thorpe, L. A., & Trehub, S. E. (1987). Infants’ perception of musical relations in short transposed tone sequences. Canadian Journal of Psychology, 41:33–47.
科恩 D. (1971)。Palestrina 对位法:不激动的言语的音乐表达。音乐理论杂志, 15:85–111。
Cohen, D. (1971). Palestrina counterpoint: A musical expression of unexcited speech. Journal of Music Theory, 15:85–111.
Cohen, LS、Lehericy, F.、Chochon, F.、Lemer, C.、Rivaud, S. 和 Dehaene, S. (2002)。视觉皮层的特定语言调整?视觉词形区域的功能属性。大脑, 125:1054–1069。
Cohen, L. S., Lehericy, F., Chochon, F., Lemer, C., Rivaud, S., & Dehaene, S. (2002). Language-specific tuning of visual cortex? Functional properties of the visual word form area. Brain, 125:1054–1069.
Coker, W. (1972)。音乐与意义:音乐美学的理论介绍。纽约:新闻自由。
Coker, W. (1972). Music and Meaning: A Theoretical Introduction to Musical Aesthetics. New York: Free Press.
科尔曼,J. (1999)。Tashlhiyt Berber 中与音节辅音相关的 vocoids 的性质。第 14 届国际语音科学大会论文集,旧金山,第 735-738 页。
Coleman, J. (1999). The nature of vocoids associated with syllabic consonants in Tashlhiyt Berber. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, pp. 735–738.
Collier, R. (1975)。语调模式的生理相关性。美国声学学会杂志, 58:249–255。
Collier, R. (1975). Physiological correlates of intonation patterns. Journal of the Acoustical Society of America, 58:249–255.
Collier, R. (1991)。多语言语调合成。语音学杂志, 19:61-73。
Collier, R. (1991). Multi-language intonation synthesis. Journal of Phonetics, 19:61–73.
Comrie, B.、Matthews, S. 和 Polinsky, M.(编辑)。(1996)。语言地图集。伦敦:四开本。
Comrie, B., Matthews, S., & Polinsky, M. (Eds.). (1996). The Atlas of Languages. London: Quarto.
Cone, E. (1974)。作曲家的声音。加州伯克利:加州大学出版社。
Cone, E. (1974). The Composer’s Voice. Berkeley, CA: University of California Press.
康奈尔 B. (1999)。四种音调和下降趋势:Mambila 音高实现的初步报告。在:PFA Kotey(主编),非洲语言学和语言的新维度(非洲语言学趋势 3)(第 74-88 页)。新泽西州特伦顿:非洲世界出版社。
Connell, B. (1999). Four tones and downtrend: A preliminary report of pitch realization in Mambila. In: P. F. A. Kotey (Ed.), New Dimensions in African Linguistics and Languages (Trends in African Linguistics 3) (pp. 74–88). Trenton, NJ: Africa World Press.
康奈尔 B. (2000)。Mambila 语调的感知。语言和演讲, 43:163-182。
Connell, B. (2000). The perception of lexical tone in Mambila. Language and Speech, 43:163–182.
Conway, CM, & Christiansen, MH (2001)。非人类灵长类动物的顺序学习。认知科学趋势, 5:539–546。
Conway, C. M., & Christiansen, M. H. (2001). Sequential learning in non-human primates. Trends in Cognitive Sciences, 5:539–546.
N. Cook (1987a)。音乐分析指南。英国牛津:牛津大学出版社。
Cook, N. (1987a). A Guide to Musical Analysis. Oxford, UK: Oxford University Press.
Cook, N. (1987b)。大规模音调闭合的感知。音乐感知, 5:197-206。
Cook, N. (1987b). The perception of large-scale tonal closure. Music Perception, 5: 197–206.
Cook, N., & Dibben, N. (2001)。情感的音乐学方法。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 45-70 页)。英国牛津:牛津大学出版社。
Cook, N., & Dibben, N. (2001). Musicological approaches to emotion. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 45–70). Oxford, UK: Oxford University Press.
Cook, ND (2002)。语气和思想。阿姆斯特丹:J. Benjamins。
Cook, N. D. (2002). Tone of Voice and Mind. Amsterdam: J. Benjamins.
Cook, ND, & Fujisawa, TX (2006)。和声感知的心理物理学:和声是一种三音现象。实证音乐学评论, 1:106–126。
Cook, N. D., & Fujisawa, T. X. (2006). The psychophysics of harmony perception: Harmony is a three-tone phenomenon. Empirical Musicology Review, 1:106–126.
D. 库克 (1959)。音乐的语言。英国牛津:牛津大学出版社。P. 库克 (1992)。关于在布干达和布索加(乌干达)进行的音调感知实验的报告。非洲音乐国际图书馆杂志, 7:119–125。
Cooke, D. (1959). The Language of Music. Oxford, UK: Oxford University Press. Cooke, P. (1992). Report on pitch perception experiments carried out in Buganda and Busoga (Uganda). Journal of International Library of African Music, 7:119–125.
Cooper, GW 和 Meyer, LB (1960)。音乐的节奏结构。芝加哥:芝加哥大学出版社。
Cooper, G. W., & Meyer, L. B. (1960). The Rhythmic Structure of Music. Chicago: University of Chicago Press.
Cooper, WE, & Eady, SJ (1986)。语音制作中的韵律音位学。记忆与语言杂志, 25:369-384。
Cooper, W. E., & Eady, S. J. (1986). Metrical phonology in speech production. Journal of Memory and Language, 25:369–384.
Cooper, WE 和 Sorensen, J. (1977)。句法边界处的基频轮廓。美国声学学会杂志, 62:683–692。
Cooper, W. E., & Sorensen, J. (1977). Fundamental frequency contours at syntactic boundaries. Journal of the Acoustical Society of America, 62:683–692.
科波拉,M. (2002)。Homesign 中语法类别的出现:来自尼加拉瓜基于家庭的手势系统的证据。博士 论文,罗切斯特大学。
Coppola, M. (2002). The Emergence of Grammatical Categories in Homesign: Evidence From Family-Based Gesture Systems in Nicaragua. Ph.D. dissertation, University of Rochester.
Costa-Giomi, E. (2003)。幼儿的谐波感知。纽约科学院年鉴, 999:477–484。
Costa-Giomi, E. (2003). Young children’s harmonic perception. Annals of the New York Academy of Sciences, 999:477–484.
Courtney, D. (1998)。Tabla 基础知识(第 3 版)。德克萨斯州休斯顿:Sur Sangeet Services。
Courtney, D. (1998). Fundamentals of Tabla (3rd ed.). Houston, TX: Sur Sangeet Services.
G. 考恩 (1948)。Mazateco 口哨演讲。语言, 24:280-286。
Cowan, G. (1948). Mazateco whistle speech. Language, 24:280–286.
Croonen, WJM (1994)。旋律模式的记忆:刺激和主题相关特征的调查。博士 论文,荷兰埃因霍温技术大学。
Croonen, W. J. M. (1994). Memory for Melodic Patterns: An Investigation of Stimulus-and Subject-Related Characteristics. Ph.D. dissertation, Technische Universiteit Eindhoven, The Netherlands.
Cross, I. (2001)。回顾音乐的起源。音乐感知, 18:513–521。
Cross, I. (2001). Review of The Origins of Music. Music Perception, 18:513–521.
Cross, I. (2003)。音乐、认知、文化和进化。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 42-56 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Cross, I. (2003). Music, cognition, culture, and evolution. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 42–56). Cambridge, MA: MIT Press.
Cross, I.(提交)。音乐性和人类的文化能力。
Cross, I. (submitted). Musicality and the human capacity for culture.
Cuddy, LL, & Badertscher, B. (1987)。音调等级的恢复:跨年龄和音乐体验水平的一些比较。知觉与心理物理学, 41:609–20。
Cuddy, L. L., & Badertscher, B. (1987). Recovery of the tonal hierarchy: Some comparisons across age and levels of musical experience. Perception and Psychophysics, 41:609–20.
Cuddy, LL、Balkwill, L.-L.、Peretz, I. 和 Holden, RR (2005)。大学生“音聋”的调查研究. 纽约科学院年鉴, 1060:311–324。
Cuddy, L. L., Balkwill, L.-L., Peretz, I., & Holden, R. R. (2005). A study of “tone deafness” among university students. Annals of the New York Academy of Sciences, 1060:311–324.
Cuddy, LL、Cohen, AJ 和 Mewhort, DJK (1981)。对短旋律序列中结构的感知。实验心理学杂志:人类感知和表现, 7:869–883。
Cuddy, L. L., Cohen, A. J., & Mewhort, D. J. K. (1981). Perception of structure in short melodic sequences. Journal of Experimental Psychology: Human Perception and Performance, 7:869–883.
Cuddy, L. L, & Lunney, CA (1995)。旋律间隔产生的期望:感知判断和旋律连续性。知觉与心理物理学, 57:451–462。
Cuddy, L. L, & Lunney, C. A. (1995). Expectancies generated by melodic intervals: Perceptual judgments and melodic continuity. Perception and Psychophysics, 57:451–462.
Cuddy, LL, & Lyons, HI (1981)。音乐模式识别:聆听和研究音调结构和音调歧义的比较。心理音乐学, 1:15–33。
Cuddy, L. L., & Lyons, H. I. (1981). Musical pattern recognition: A comparison of listening to and studying tonal structure and tonal ambiguities. Psychomusicology, 1:15–33.
N. 卡明 (2000)。声波自我:音乐的主观性和意义。布卢明顿:印第安纳大学出版社。
Cumming, N. (2000). The Sonic Self: Musical Subjectivity and Signification. Bloomington: Indiana University Press.
F. 康明斯 (2002)。言语节奏和节奏分类学。载于:B. Bell & I. Marlien(编),Proceedings of Speech Prosody,普罗旺斯地区艾克斯(第 121-126 页)。法国普罗旺斯地区艾克斯:Parole et Langage。
Cummins, F. (2002). Speech rhythm and rhythmic taxonomy. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence (pp. 121–126). Aix-en-Provence, France: Laboratoire Parole et Langage.
Cummins, F., & Port, RF (1998)。英语重音时间的节奏限制。语音学杂志, 26:145-171。
Cummins, F., & Port, R. F. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26:145–171.
柯蒂斯 S. (1977)。精灵:现代“野孩子”的心理语言学研究。纽约:学术出版社。
Curtiss, S. (1977). Genie: A Psycholinguistic Study of a Modern-Day “Wild Child.” New York: Academic Press.
A. 卡特勒 (1980)。音节遗漏错误和等时性。载于:HW Dechert & M. Raupach(编辑),语音中的时间变量(第 183-190 页)。荷兰海牙:木桐。
Cutler, A. (1980). Syllable omission errors and isochrony. In: H. W. Dechert & M. Raupach (Eds.), Temporal Variables in Speech (pp. 183–190). The Hague, The Netherlands: Mouton.
A. 卡特勒 (1990)。在语音分割中利用韵律概率。载于:G. Altmann(主编),语音处理的认知模型:心理语言学和计算视角(第 105-121 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Cutler, A. (1990). Exploiting prosodic probabilities in speech segmentation. In: G. Altmann (Ed.), Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives (pp. 105–121). Cambridge, MA: MIT Press.
A. 卡特勒 (1994)。分段问题,有节奏的解决方案。语言, 92:81–104。
Cutler, A. (1994). Segmentation problems, rhythmic solutions. Lingua, 92:81–104.
A. 卡特勒 (2000)。通过第一语言的耳朵聆听第二语言。口译, 5:1-23。
Cutler, A. (2000). Listening to a second language through the ears of a first. Interpreting, 5:1–23.
Cutler, A., & Butterfield, S. (1992)。语音分割的节奏线索:来自接合点误解的证据。记忆与语言杂志, 31:218-236。
Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31:218–236.
Cutler, A., & Carter, DM (1987)。强声母音节在英语词汇中的优势。计算机语音和语言, 2:133–142。
Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language, 2:133–142.
Cutler, A.、Dahan, D. 和 Van Donselaar, WA (1997)。口语理解中的韵律:文献综述。语言和演讲, 40:141–202。
Cutler, A., Dahan, D., & Van Donselaar, W. A. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40:141–202.
Cutler, A., & Darwin, CJ (1981)。音素监测反应时间和前面的韵律:停止关闭持续时间和基频的影响。感知与心理物理学, 29:217-224。
Cutler, A., & Darwin, C. J. (1981). Phoneme-monitoring reaction time and preceding prosody: Effects of stop closure duration and of fundamental frequency. Perception and Psychophysics, 29:217–224.
A. 卡特勒和 DJ 福斯 (1977)。关于句子重音在句子加工中的作用。语言和演讲, 20:1-10。
Cutler, A., & Foss, D. J. (1977). On the role of sentence stress in sentence processing. Language and Speech, 20:1–10.
Cutler, A., & Norris, DG (1988)。强音节在词汇访问切分中的作用。实验心理学杂志:人类感知和表现, 14:113-121。
Cutler, A., & Norris, D. G. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14:113–121.
d'Alessandro, C., & Castellengo, M. (1994)。短时颤音的音调。美国声学学会杂志, 95:1617–1630。
d’Alessandro, C., & Castellengo, M. (1994). The pitch of short-duration vibrato tones. Journal of the Acoustical Society of America, 95:1617–1630.
d'Alessandro, C., & Mertens, P. (1995)。使用音调感知模型的自动音高轮廓程式化。计算机语音和语言, 9:257–288。
d’Alessandro, C., & Mertens, P. (1995). Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language, 9:257–288.
Dahlhaus, C. (1990)。和声调性起源的研究(RO Gjerdingen,Trans.)。新泽西州普林斯顿:普林斯顿大学出版社。
Dahlhaus, C. (1990). Studies on the Origin of Harmonic Tonality (R. O. Gjerdingen, Trans.). Princeton, NJ: Princeton University Press.
Dainora, A. (2002)。语调意义来自声调还是曲调?反对组合方法的证据。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Dainora, A. (2002). Does intonational meaning come from tones or tune? Evidence against a compositional approach. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Dalla Bella, S.、Giguere, J.-F. 和 Peretz, I. (2007)。普通人群的歌唱能力。美国声学学会杂志, 121:1182–1189。
Dalla Bella, S., Giguere, J.-F., & Peretz, I. (2007). Singing proficiency in the general population. Journal of the Acoustical Society of America, 121:1182–1189.
Dalla Bella, S.、Palmer, C. 和 Jungers, M. (2003)。音乐家和非音乐家的说话者是不同的吗?2003 年音乐感知与认知协会会议记录,拉斯维加斯,内华达州(第 34 页)。
Dalla Bella, S., Palmer, C., & Jungers, M. (2003). Are musicians different speakers than nonmusicians? Proceedings of the 2003 Meeting of the Society for Music Perception and Cognition, Las Vegas, NV (p. 34).
Dalla Bella, S., & Peretz, I. (2003)。先天性失乐会干扰与音乐同步的能力。纽约科学院年鉴, 999:166–169。
Dalla Bella, S., & Peretz, I. (2003). Congenital amusia interferes with the ability to synchronize with music. Annals of the New York Academy of Sciences, 999:166–169.
Dalla Bella, S.、Peretz, I.、Rousseau, L. 和 Gosselin, N. (2001)。音乐节奏和模式的情感价值的发展研究。认知, 80:B1-B10。
Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the affective value of tempo and mode in music. Cognition, 80:B1-B10.
A. 达马西奥 (1994)。笛卡尔的错误:情感、理性和人脑。纽约:雅芳图书。
Damasio, A. (1994). Descartes’ Error: Emotion, Reason, and the Human Brain. New York: Avon Books.
A. 达马西奥 (2003)。寻找斯宾诺莎:快乐、悲伤和感觉脑。佛罗里达州奥兰多:Harcourt。
Damasio, A. (2003). Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. Orlando, FL: Harcourt.
Daniele, JR, & Patel, AD (2004)。语言和历史对不同文化中音乐节奏的影响的相互作用 In: SD Lipscomb et al. (编辑),第 8 届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿,2004 年(第 759-762 页)。澳大利亚阿德莱德:Causal Productions。
Daniele, J. R., & Patel, A. D. (2004). The interplay of linguistic and historical influences on musical rhythm in different cultures In: S. D. Lipscomb et al. (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL, 2004 (pp. 759–762). Adelaide, Australia: Causal Productions.
C. 达尔文 (1871)。人类的起源,以及与性别相关的选择。伦敦:约翰·默里。
Darwin, C. (1871). The Descent of Man, and Selection in Relation to Sex. London: John Murray.
Dasher, R., & Bolinger, D. (1982)。关于前口音延长。国际语音协会杂志, 12:58-69。
Dasher, R., & Bolinger, D. (1982). On pre-accentual lengthening. Journal of the International Phonetic Association, 12:58–69.
道尔,RM(1983)。重音时间和音节时间重新分析。语音学杂志, 11:51-62。
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11:51–62.
道尔,RM(1987)。语言节奏的语音和音系成分。第 11 届国际语音科学大会论文集,塔林, 5:447–450。
Dauer, R. M. (1987). Phonetic and phonological components of language rhythm. Proceedings of the 11th International Congress of Phonetic Sciences, Tallinn, 5:447–450.
Davidson, L.、McKernon, P. 和 Gardner, H. (1981)。歌曲的习得:一种发展方法。在:全国心理学在音乐教学中的应用研讨会论文集。弗吉尼亚州雷斯顿:音乐教育者全国会议。
Davidson, L., McKernon, P., & Gardner, H. (1981). The acquisition of song: A developmental approach. In: Proceedings of the National Symposium on the Application of Psychology to the Teaching and Learning of Music. Reston, VA: Music Educators National Conference.
S. 戴维斯 (1980)。音乐中情感的表达。心灵, 89:67-86。
Davies, S. (1980). The expression of emotion in music. Mind, 89:67–86.
S. 戴维斯 (1994)。音乐意义和表达。纽约伊萨卡:康奈尔大学出版社。
Davies, S. (1994). Musical Meaning and Expression. Ithaca, New York: Cornell University Press.
S. 戴维斯 (2002)。器乐的深度。英国美学杂志, 42:343-56。
Davies, S. (2002). Profundity in instrumental music. British Journal of Aesthetics, 42: 343–56.
S. 戴维斯 (2003)。音乐哲学中的主题。英国牛津:牛津大学出版社。
Davies, S. (2003). Themes in the Philosophy of Music. Oxford, UK: Oxford University Press.
Davis, MH, & Johnsrude, IS (2007)。听到语音:自上而下影响听觉和语音感知之间的界面。听力研究, 29:229–237。
Davis, M. H., & Johnsrude, I. S. (2007). Hearing speech sounds: Top-down influences on the interface between audition and speech perception. Hearing Research, 29:229–237.
德容,KJ (1995)。英语中突出的声门上发音:作为局部超发音的语言压力。美国声学学会杂志, 97:491–504。
de Jong, K. J. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America, 97:491–504.
de Pijper, JR (1983)。建模英式英语语调。荷兰多德雷赫特:Foris。
de Pijper, J. R. (1983). Modeling British English Intonation. Dordrecht, The Netherlands: Foris.
de Pijper, JR, & Sanderman, AA (1994)。关于韵律边界的感知强度及其与超音段线索的关系。美国声学学会杂志, 96:2037–2047。
de Pijper, J. R., & Sanderman, A. A. (1994). On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. Journal of the Acoustical Society of America, 96:2037–2047.
TW 执事 (1997)。符号物种:语言和大脑的共同进化。纽约:WW 诺顿。
Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. New York: W. W. Norton.
执事,TW (2003)。通用语法和符号学约束。载于:MH Christiansen & S. Kirby(编辑),语言进化(第 111-139 页)。英国牛津:牛津大学出版社。
Deacon, T. W. (2003). Universal grammar and semiotic constraints. In: M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 111–139). Oxford, UK: Oxford University Press.
DeCasper, AJ, & Fifer, WP (1980)。人际关系:新生儿更喜欢母亲的声音。科学, 208:1174–1176。
DeCasper, A. J., & Fifer, W. P. (1980). of human bonding: Newborns prefer their mothers’ voices. Science, 208:1174–1176.
DeCasper, AJ、Lecanuet, J.-P.、Busnel, M.-C.、Granier-Deferre, C. 和 Maugeais, R. (1994)。胎儿对母亲反复讲话的反应。婴儿行为和发展, 17:159–164。
DeCasper, A. J., Lecanuet, J.-P., Busnel, M.-C., Granier-Deferre, C., & Maugeais, R. (1994). Fetal reactions to recurrent maternal speech. Infant Behavior and Development, 17:159–164.
DeCasper, AJ, & Spence, MJ (1986)。产前母语会影响新生儿对语音的感知。婴儿行为与发展, 9:133–150。
DeCasper, A. J., & Spence, M. J. (1986). Prenatal maternal speech influences newborns’ perception of speech sounds. Infant Behavior and Development, 9:133–150.
德拉特, P. (1963)。比较英语、德语、西班牙语和法语的韵律特征。国际应用语言学评论, 1:193–210。
Delattre, P. (1963). Comparing the prosodic features of English, German, Spanish and French. International Review of Applied Linguistics, 1:193–210.
Delattre, P. (1966)。语言间音节长度条件的比较。国际应用语言学评论, 4:183–198。
Delattre, P. (1966). A comparison of syllable length conditioning among languages. International Review of Applied Linguistics, 4:183–198.
Deliege, I. (1987)。听音乐的分组条件:Lerdahl 和 Jackendoff 的分组偏好规则的一种方法。音乐感知, 4:325–360。
Deliege, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl and Jackendoff’s grouping preference rules. Music Perception, 4:325–360.
Deliege, I.、Melen, M.、Stammers, D. 和 Cross, I. (1996)。实时聆听一段音乐的音乐图式。音乐感知, 14:117–160。
Deliege, I., Melen, M., Stammers, D., & Cross, I. (1996). Musical schemata in real-time listening to a piece of music. Music Perception, 14:117–160.
F. 戴尔 (1989)。Concordances rythmiques entre la musique et les paroles dans le chant。载于:M. Dominicy(主编),Le Souci des Apparences(第 121-136 页)。比利时布鲁塞尔:布鲁塞尔大学出版社。
Dell, F. (1989). Concordances rythmiques entre la musique et les paroles dans le chant. In: M. Dominicy (Ed.), Le Souci des Apparences (pp. 121–136). Brussels, Belgium: Editions de l’Universite de Bruxelles.
Dell, F., & Halle, J.(出版中)。比较法语和英语歌曲中的音乐文本设置。在:J.-L。Aroui(主编),诗歌形式类型学会议论文集, 2005 年 4 月,巴黎。
Dell, F., & Halle, J. (in press). Comparing musical textsetting in French and English songs. In: J.-L. Aroui (Ed.), Proceedings of the Conference Typology of Poetic Forms, April 2005, Paris.
Dellwo, V. (2004)。The BonnTempo-Corpus & BonnTempo-Tools:用于研究语音节奏和语速的数据库。在:第 8 届 ICSLP 会议记录,韩国济州岛。
Dellwo, V. (2004). The BonnTempo-Corpus & BonnTempo-Tools: A database for the study of speech rhythm and rate. In: Proceedings of the 8th ICSLP, Jeju Island, Korea.
德隆先生 (2000)。基底神经节。载于:E. Kandel、JH Schwarz 和 TM Jesseell(编),《神经科学原理》(第 4 版,第 853-867 页)。纽约:麦格劳-希尔。
DeLong, M. R. (2000). The basal ganglia. In: E. Kandel, J. H. Schwarz, and T. M. Jesseell (Eds.), Principles of Neural Science (4th ed., pp. 853–867). New York:McGraw-Hill.
Demany, L., & Armand, F. (1984)。婴儿早期音色的感知现实。美国声学学会杂志七十六:57—66。
Demany, L., & Armand, F. (1984). The perceptual reality of tone chroma in early infancy. Journal of the Acoustical Society of America 76:57–66.
Demany, L., & McAnally, KI (1994)。宽频率调制中频率波峰和波谷的感知。美国声学学会杂志, 96:706–715。
Demany, L., & McAnally, K. I. (1994). The perception of frequency peaks and troughs in wide frequency modulations. Journal of the Acoustical Society of America, 96:706–715.
Demany, L.、McKenzie, B. 和 Vurpillot, E. (1977)。婴儿早期的节奏感知。自然, 266:718–719。
Demany, L., McKenzie, B., & Vurpillot, E. (1977). Rhythm perception in early infancy. Nature, 266:718–719.
Denora, T. (1999)。音乐作为自我的技术。诗学:文学、媒体和艺术实证研究杂志, 26:1-26。
Denora, T. (1999). Music as a technology of the self. Poetics: Journal of Empirical Research on Literature, the Media, and the Arts, 26:1–26.
Denora, T. (2001)。审美机构和音乐实践:音乐和情感社会学的新方向。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 161-180 页)。英国牛津:牛津大学出版社。
Denora, T. (2001). Aesthetic agency and musical practice: New directions in the sociology of music and emotion. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 161–180). Oxford, UK: Oxford University Press.
Desain, P. (1992)。节奏感知的(分解)可组合理论。音乐感知, 9:439–454。
Desain, P. (1992). A(de)composable theory of rhythm perception. Music Perception, 9: 439–454.
Desain, P., & Honing, H. (1999)。节拍感应的计算模型:基于规则的方法。新音乐研究杂志, 28:29-42。
Desain, P., & Honing, H. (1999). Computational models of beat induction: The rule-based approach. Journal of New Music Research, 28:29–42.
Deutsch, D. (1978)。延迟音高比较和接近原则。感知与心理物理学, 23:227-230。
Deutsch, D. (1978). Delayed pitch comparisons and the principle of proximity. Perception and Psychophysics, 23:227–230.
Deutsch, D., & Feroe, J. (1981)。音调音乐中音高序列的内部表示。心理评论, 88:503-522。
Deutsch, D., & Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological Review, 88:503–522.
Deutsch, D.、Henthorn, T. 和 Dolson, M. (2004)。绝对音高、语音和音调语言:一些实验和建议的框架。音乐感知, 21:339–356。
Deutsch, D., Henthorn, T., & Dolson, M. (2004). Absolute pitch, speech, and tone language: Some experiments and a proposed framework. Music Perception, 21:339–356.
Deutsch, D., Henthorn, T., Marvin, E., & Xu, H.-S. (2006)。美国和中国音乐学院学生的绝对音高:流行差异和与言语相关的关键时期的证据。美国声学学会杂志, 119:719–722。
Deutsch, D., Henthorn, T., Marvin, E., & Xu, H.-S. (2006). Absolute pitch among American and Chinese conservatory students: Prevalence differences, and evidence for a speech- related critical period. Journal of the Acoustical Society of America, 119:719–722.
Dewitt, LA, & Crowder, RG (1986)。短暂延迟后识别新颖的旋律。音乐感知, 3:259–274。
Dewitt, L. A., & Crowder, R. G. (1986). Recognition of novel melodies after brief delays. Music Perception, 3:259–274.
Di Cristo, A. (1998)。法语语调。载于:D. Hirst & A. Di Cristo(编辑),语调系统:二十种语言的调查(第 295-218 页)。英国剑桥:剑桥大学出版社。
Di Cristo, A. (1998). Intonation in French. In: D. Hirst & A. Di Cristo (Eds.)., Intonation Systems: A Survey of Twenty Languages (pp. 295–218). Cambridge, UK: Cambridge University Press.
Di Pietro, M.、Laganaro, M.、Leemann, B.、Schnider, A. (2003)。Amusia:患有传导性失语症的专业音乐家左颞顶叶损伤后的选择性节奏处理。大脑和语言, 87:152–153。
Di Pietro, M., Laganaro, M., Leemann, B., Schnider, A. (2003). Amusia: Selective rhythm processing following left temporoparietal lesion in a professional musician with conduction aphasia. Brain and Language, 87:152–153.
Dibben, N. (2001)。当我们听到音乐时,我们听到了什么?音乐感知和音乐材料。科学音乐, 5:161–194。
Dibben, N. (2001). What do we hear, when we hear music? Music perception and musical material. Musicae Scientiae, 5:161–194.
Diehl, RL、Lindblom, B. 和 Creeger, CP (2003)。增加听觉表征的真实性可以进一步深入了解元音语音学。第 15 届国际语音科学大会论文集,巴塞罗那,第 1381–1384 页。
Diehl, R. L., Lindblom, B., & Creeger, C. P. (2003). Increasing realism of auditory representations yields further insights into vowel phonetics. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 1381–1384.
Diehl, RL、Lotto, AJ 和 Holt, LL (2004)。语音感知。心理学年度回顾, 55:149–179。
Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of Psychology, 55:149–179.
L. 迪利 (2005)。音调系统的语音学和音系学。博士 论文,麻省理工学院。
Dilley, L. (2005). The Phonetics and Phonology of Tonal Systems. Ph.D. dissertation, MIT.
Dilley, L.、Shattuck-Hufnagel, S. 和 Ostendorf, M. (1996)。词首元音的声门化作为韵律结构的函数。语音学杂志, 24, 423–444。
Dilley, L., Shattuck-Hufnagel, S., & Ostendorf, M. (1996). Glottalization of word-initial vowels as a function of prosodic structure. Journal of Phonetics, 24, 423–444.
Dissanayake, E. (2000)。早期母婴互动中时间艺术的前因。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 389-410 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Dissanayake, E. (2000). Antecedents of the temporal arts in early mother-infant interaction. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 389–410). Cambridge, MA: MIT Press.
Dloniak, SM, & Deviche, P. (2001)。睾酮和光周期条件对成年雄性黑眼灯芯草(Junco hyemalis) 歌曲产生和发声控制区体积的影响。激素与行为, 39:95–105。
Dloniak, S. M., & Deviche, P. (2001). Effects of testosterone and photoperiodic condition on song production and vocal control region volumes in adult male Dark-Eyed Juncos (Junco hyemalis). Hormones and Behavior, 39:95–105.
Docherty, G., & Foulkes, P. (1999)。器乐语音学和语音变异:泰恩河畔纽卡斯尔和德比郡的案例研究。载于:P. Foulkes & G. Docherty(编辑),城市之声:不列颠群岛的口音研究(第 47-71 页)。伦敦:阿诺德。
Docherty, G., & Foulkes, P. (1999). Instrumental phonetics and phonological variation: Case studies from Newcastle upon Tyne and Derby. In: P. Foulkes & G. Docherty (Eds.), Urban Voices: Accent Studies in the British Isles (pp. 47–71). London: Arnold.
Donovan, A., & Darwin, CJ (1979)。感知到的说话节奏。第九届国际语音科学大会论文集,哥本哈根, 2:268–274。
Donovan, A., & Darwin, C. J. (1979). The perceived rhythm of speech. Proceedings of the 9th International Congress of Phonetic Sciences, Copenhagen, 2:268–274.
Douglas, KM, & Bilkey, DK (2007)。Amusia 与空间处理缺陷有关。自然神经科学, 10:915–921。
Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing. Nature Neuroscience, 10:915–921.
Doupe, AJ, & Kuhl, PK (1999)。鸟鸣和人类语言:共同的主题和机制。神经科学年度回顾, 22:567–631。
Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22:567–631.
Doupe, AJ, Perkel, DJ, Reiner, A., & Stern, EA (2005)。鸟脑可以教基底神经节研究一首新歌。神经科学趋势, 28:353–363。
Doupe, A. J., Perkel, D. J., Reiner, A., & Stern, E. A. (2005). Birdbrains could teach basal ganglia research a new song. Trends in Neurosciences, 28:353–363.
WJ 道林 (1973)。旋律记忆中的节奏组和主观块。感知与心理物理学, 14:37–40。
Dowling, W. J. (1973). Rhythmic groups and subjective chunks in memory for melodies. Perception and Psychophysics, 14:37–40.
WJ 道林 (1978)。音阶和轮廓:旋律记忆理论的两个组成部分。心理评论, 85:341-354。
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review, 85:341–354.
WJ 道林 (1986)。上下文对旋律识别的影响:比例步长与间隔表示。音乐感知, 3:281–296。
Dowling, W. J. (1986). Context effects on melody recognition: Scale-steps versus interval representations. Music Perception, 3:281–296.
WJ 道林 (1988)。音调结构与儿童早期音乐学习。载于:JA Sloboda(主编) ,音乐的生成过程(第 113-128 页)。英国牛津:牛津大学出版社。
Dowling, W. J. (1988). Tonal structure and children’s early learning of music. In: J. A. Sloboda (Ed.), Generative Processes in Music (pp. 113–128). Oxford, UK: Oxford University Press.
WJ 道林 (2001)。音乐感知:收录于:EB Goldstein(编),Blackwell Handbook of Perception(第 469-498 页)。马萨诸塞州马尔登:布莱克威尔。
Dowling, W. J. (2001). Perception of music: In: E. B. Goldstein (Ed.), Blackwell Handbook of Perception (pp. 469–498). Malden, MA: Blackwell.
WJ 道林和 JC 巴特利特 (1981)。音程信息在旋律长期记忆中的重要性。心理音乐学1:30–49。
Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in longterm memory for melodies. Psychomusicology 1:30–49.
WJ 道林和 DL 哈伍德 (1986)。音乐认知。佛罗里达州奥兰多:学术出版社。
Dowling, W. J., & Harwood, D. L. (1986). Music Cognition. Orlando, FL: Academic Press.
Dowling, WJ、Kwak, S. 和 Andrews, MW (1995)。识别新颖旋律的时间过程。感知与心理物理学, 57:136-149。
Dowling, W. J., Kwak, S., & Andrews, M. W. (1995). The time course of recognition of novel melodies. Perception and Psychophysics, 57:136–149.
C. 德雷克 (1998)。复杂听觉序列的时间组织所涉及的心理过程:普遍和获得的过程。音乐感知, 16:11–26。
Drake, C. (1998). Psychological processes involved in the temporal organization of complex auditory sequences: Universal and acquired processes. Music Perception, 16:11–26.
Drake, C., & Ben El Heni, J. (2003)。与音乐同步:跨文化差异。纽约科学院年鉴, 999:428–437。
Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of the New York Academy of Sciences, 999:428–437.
Drake, C., & Botte, MC (1993)。听觉序列中的速度敏感性:多视模型的证据。感知与心理物理学, 54:277-286。
Drake, C., & Botte, M. C. (1993). Tempo sensitivity in auditory sequences: Evidence for a multiple-look model. Perception and Psychophysics, 54:277–286.
Drake, C.、Jones, M. 和 Baruch, C. (2000)。节奏性听觉顺序的发展:调谐,参考期,焦点注意。认知,77, 251–288。
Drake, C., Jones, M., & Baruch, C. (2000). The development of rhythmic attending in auditory sequence: Attunement, reference period, focal attending. Cognition, 77, 251–288.
Drake, C.、Penel, A. 和 Bigand, E. (2000)。用机械和富有表现力的音乐及时敲击。音乐感知, 18:1-24。
Drake, C., Penel, A., & Bigand, E. (2000). Tapping in time with mechanically and expressively performed music. Music Perception, 18:1–24.
Drayna, D.、Manichaikul, A.、de Lange, M.、Snieder, H. 和 Spector, T. (2001)。人类音高识别的遗传相关性。科学, 291:1969-72。
Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic correlates of musical pitch recognition in humans. Science, 291:1969–72.
邓巴,罗德岛 (2003)。语言的起源和随后的演变。载于:MH Christiansen & S. Kirby(编辑),语言进化(第 219-234 页)。英国牛津:牛津大学出版社。
Dunbar, R. I. (2003). The origin and subsequent evolution of language. In: M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 219–234). Oxford, UK: Oxford University Press.
Dupoux, E.、Peperkamp, S. 和 Sebastian-Galles, N. (2001)。一种研究压力“耳聋”的稳健方法。美国声学学会杂志, 110:1606–1618。
Dupoux, E., Peperkamp, S., & Sebastian-Galles, N. (2001). A robust method to study stress “deafness.” Journal of the Acoustical Society of America, 110:1606–1618.
伊迪,SJ (1982)。F0 语音模式的差异:声调语言与重音语言。语言和言语, 25:29-42。
Eady, S. J. (1982). Differences in the F0 patterns of speech: Tone language versus stress language. Language and Speech, 25:29–42.
马萨诸塞州厄尔 (1975)。越南北部音调的声学研究(专着 11)。加利福尼亚州圣巴巴拉:语音通信研究实验室。
Earle, M. A. (1975). An Acoustic Study of Northern Vietnamese Tones (Monograph 11). Santa Barbara, CA: Speech Communications Research Laboratory.
爱德曼,通用汽车(2006 年)。第二自然:脑科学与人类知识。康涅狄格州纽黑文:耶鲁大学出版社。
Edelman, G. M. (2006). Second Nature: Brain Science and Human Knowledge. New Haven, CT: Yale University Press.
Edmondson, JA, & Gregerson, KJ (1992)。在五级音调系统上。载于:SJ Hwang 和 WR Merrifield(编辑),语境中的语言:Robert E. Longacre 的论文(第 555-576 页)。
Edmondson, J. A., & Gregerson, K. J. (1992). On five-level tone systems. In: S. J. Hwang & W. R. Merrifield (Eds.), Language in Context: Essays for Robert E. Longacre (pp. 555–576).
德克萨斯州达拉斯:夏季语言学学院。Edworthy, J. (1985a)。旋律处理中的音程和轮廓。音乐感知, 2:375–388。
Dallas, TX: Summer Institute of Linguistics. Edworthy, J. (1985a). Interval and contour in melody processing. Music Perception, 2:375–388.
Edworthy, J. (1985b)。旋律轮廓和音乐结构。载于:P. Howell、I. Cross 和 R. West(编),音乐结构和认知(第 169–188 页)。伦敦:学术出版社。
Edworthy, J. (1985b). Melodic contour and musical structure. In: P. Howell, I. Cross, & R. West (Eds.), Musical Structure and Cognition (pp. 169–188). London: Academic Press.
Eerola, T.、Luck, G. 和 Toiviainen, P. (2006)。学龄前儿童身体与音乐同步的调查。载于:M. Baroni、AR Addessi、R. Caterina、M. Costa,第 9 届国际音乐感知与认知会议论文集 (ICMPC9),意大利博洛尼亚,第 472–476 页。
Eerola, T., Luck, G., & Toiviainen, P. (2006). An investigation of pre-schoolers’ corporeal synchronization with music. In: M. Baroni, A. R. Addessi, R. Caterina, M. Costa, Proceedings of the 9th International Conference on Music Perception and Cognition (ICMPC9), Bologna/Italy, pp. 472–476.
Egnor, SER, & Hauser, MD (2004)。灵长类动物发声学习进化的悖论。神经科学趋势, 27,649–654。
Egnor, S. E. R., & Hauser, M. D. (2004). A paradox in the evolution of primate vocal learning. Trends in Neurosciences, 27, 649–654.
Ehresman, D., & Wessel, D. (1978)。音色类比的感知。Rapports Ircam, 13/ 78。
Ehresman, D., & Wessel, D. (1978). Perception of timbral analogies. Rapports Ircam, 13/78.
Eimas, PD、Siqueland, ER、Jusczyk, P. 和 Vigorito, J. (1971)。婴儿的言语感知。科学, 171:303–306。
Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. (1971). Speech perception by infants. Science, 171:303–306.
Eisler, H. (1976)。1868-1975 年主观持续时间的实验:幂函数指数的集合。心理公报, 83:1154-1171。
Eisler, H. (1976). Experiments on subjective duration 1868–1975: A collection of power function exponents. Psychological Bulletin, 83:1154–1171.
Ekman, P.、Friesen, W.、O'Sullivan, M. 等。(1987)。情绪面部表情判断的普遍性和文化差异。人格与社会心理学杂志, 53:712–717。
Ekman, P., Friesen, W., O’Sullivan, M., et al. (1987). Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology, 53:712–717.
Elbert, T.、Pantev, C.、Wienbruch, C.、Rockstroh, B. 和 Taub, E.(1995 年)。弦乐演奏者更多地使用左手与手指的皮质代表增加有关。科学, 270:305–307。
Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased use of the left hand in string players associated with increased cortical representation of the fingers. Science, 270:305–307.
Elbert, T.、Ulrich, R.、Rockstroh, B. 和 Lutzenberger, W. (1991)。由 CNV 样脑电位反映的时间间隔的处理。心理生理学, 28:648-655。
Elbert, T., Ulrich, R., Rockstroh, B., & Lutzenberger, W. (1991). The processing of temporal intervals reflected by CNV-like brain potentials. Psychophysiology, 28:648–655.
Elfenbein, HA, & Ambady, N. (2003)。识别情绪的普遍性和文化差异。心理科学的当前方向, 12:159-164。
Elfenbein, H. A., & Ambady, N. (2003). Universals and cultural differences in recognizing emotions. Current Directions in Psychological Science, 12:159–164.
A. 埃利斯 (1885)。在各个国家的音阶上。皇家艺术学会杂志三十三:485—527。
Ellis, A. (1885). On the musical scales of various nations. Journal of the Royal Society of Arts 33:485–527.
Elman, J. (1999)。语言的出现:一个阴谋论。载于:B. MacWhinney(主编),语言的出现(第 1-27 页)。新泽西州 Mahwah:Erlbaum。
Elman, J. (1999). The emergence of language: A conspiracy theory. In: B. MacWhinney (Ed.), The Emergence of Language (pp. 1–27). Mahwah, NJ: Erlbaum.
Elman, JL、Bates, EA、Johnson, MH、Karmiloff-Smith, A.、Parisi, D. 和 Plunkett, K.(1996 年)。反思先天性:关于发展的联结主义观点。马萨诸塞州剑桥市:麻省理工学院出版社。
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press.
Emmorey, K. (2002)。语言、认知和大脑:手语研究的见解。新泽西州 Mahwah:Lawrence Erlbaum。
Emmorey, K. (2002). Language, Cognition, and the Brain: Insights From Sign Language Research. Mahwah, NJ: Lawrence Erlbaum.
Enard, W.、Przeworski, M.、Fisher, SE、Lai, CSL、Wiebe, V.、Kitano, T. 等。(2002)。FOXP2 的分子进化,一种参与语言和语言的基因。自然, 418:869–872。
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wiebe, V., Kitano, T., et al. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature, 418:869–872.
Escoffier, N., & Tillmann, B. (2006)。音调功能调节视觉处理的速度。在:M. Baroni、AR Addessi、R. Caterina 和 M. Costa(编辑),第 9 届国际音乐感知和认知会议论文集 (ICMPC9),博洛尼亚/意大利,8 月 22-26 日,p. 1878 年。
Escoffier, N., & Tillmann, B. (2006). Tonal function modulates speed of visual processing. In: M. Baroni, A. R. Addessi, R. Caterina, & M. Costa (Eds.), Proceedings of the 9th International Conference on Music Perception and Cognition (ICMPC9), Bologna/Italy, August 22–26, p. 1878.
埃弗里特,DL (2005)。Piraha 中语法和认知的文化限制:另一种审视人类语言的设计特征。当前人类学, 46:621-646。
Everett, D. L. (2005). Cultural constraints on grammar and cognition in Piraha: Another look at the design features of human language. Current Anthropology, 46: 621–646.
Faber, D. (1986)。英语节奏教学:一个新的理论基础。国际应用语言学评论, 24:205-216。
Faber, D. (1986). Teaching the rhythms of English: A new theoretical base. International Review of Applied Linguistics, 24:205–216.
Falk, D. (2004a)。早期人类的前语言进化:母亲语从何而来?(目标文章)。行为与脑科学, 27:491-503。
Falk, D. (2004a). Prelinguistic evolution in early hominins: Whence motherese? (Target article). Behavioral and Brain Sciences, 27:491–503.
Falk, D. (2004b)。“放下婴儿”假说:两足行走、牙牙学语和婴儿吊带(对评论的回应)。行为与脑科学, 27:526–534。
Falk, D. (2004b). The “putting the baby down” hypothesis: Bipedalism, babbling, and baby slings (Response to commentaries). Behavioral and Brain Sciences, 27:526–534.
Fant, G.、Kruckenberg, A. 和 Nord, L. (1991a)。瑞典语、法语和英语中压力的持续时间相关性。语音学杂志,19, 351–365。Fant, G.、Kruckenberg, A. 和 Nord, L. (1991b)。与音乐表演类比的散文和诗歌阅读中的重音模式和节奏。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 380–407 页)。伦敦:麦克米伦。
Fant, G., Kruckenberg, A., & Nord, L. (1991a). Durational correlates of stress in Swedish, French and English. Journal of Phonetics, 19, 351–365. Fant, G., Kruckenberg, A., & Nord, L. (1991b). Stress patterns and rhythm in the reading of prose and poetry with analogies to music performance. In: J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 380–407). London: Macmillan.
公关法恩斯沃斯 (1954)。赫夫纳形容词列表的研究。美学与艺术批评杂志, 13:97-103。
Farnsworth, P. R. (1954). A study of the Hevner adjective list. Journal of Aesthetics and Art Criticism, 13:97–103.
Fassbender, C. (1996)。婴儿对语音和音乐声学参数的听觉敏感性。载于:I. Deliege & J. Sloboda(编),Musical Beginnings(第 56-87 页)。英国牛津:牛津大学出版社。
Fassbender, C. (1996). Infant’s auditory sensitivity toward acoustic parameters of speech and music. In: I. Deliege & J. Sloboda (Eds.), Musical Beginnings (pp. 56–87). Oxford, UK: Oxford University Press.
Fedorenko, E.、Patel, AD、Casasanto, D.、Winawer, J. 和 Gibson, E. (2009)。语言和音乐的结构整合:共享系统的证据。记忆与认知, 37:1-9。
Fedorenko, E., Patel, A. D., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory and Cognition, 37:1–9.
费尔德 S. (1974)。民族音乐学中的语言模型。民族音乐学, 18:197–217。
Feld, S. (1974). Linguistic models in ethnomusicology. Ethnomusicology, 18:197–217.
Feld, S., & Fox, AA (1994)。音乐和语言。人类学年度回顾, 23:25–53。
Feld, S., & Fox, A. A. (1994). Music and language. Annual Review of Anthropology, 23:25–53.
Ferland, MB, & Mendelson, MJ (1989)。婴儿对旋律轮廓的分类。婴儿行为和发展, 12:341–355。
Ferland, M. B., & Mendelson, M. J. (1989). Infant’s categorization of melodic contour. Infant Behavior and Development, 12:341–355.
Fernald, A. (1985)。四个月大的婴儿更喜欢听母语。婴儿行为和发展, 8:181–195。
Fernald, A. (1985). Four-month-olds prefer to listen to motherese. Infant Behavior and Development, 8:181–195.
Fernald, A. (1992)。母亲对婴儿说话时的意味深长的旋律。载于:H. Papousek、U. Jurgens 和 M. Papousek(编),非语言声乐交流:比较和发展方面(第 262-282 页)。英国剑桥:剑桥大学出版社。
Fernald, A. (1992). Meaningful melodies in mothers’ speech to infants. In: H. Papousek, U. Jurgens, & M. Papousek (Eds.), Nonverbal Vocal Communication: Comparative and Developmental Aspects (pp. 262–282). Cambridge, UK: Cambridge University Press.
Fernald, A., & Kuhl, P. (1987)。婴儿偏好母语的声学决定因素。婴儿行为与发展, 10:279–293。
Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development, 10:279–293.
Fernald, A.、Taeschner, T.、Dunn, J.、Papousek, M.、Boysson-Bardies, B. 和 Fukui, I.(1989)。母亲和父亲对前语言婴儿说话的韵律修改的跨语言研究。儿童语言杂志, 16:477–501。
Fernald, A., Taeschner, T., Dunn, J., Papousek, M., Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16:477–501.
Ferreira, F. (1991)。造句过程中韵律的创造。心理评论, 100:233-253。
Ferreira, F. (1991). The creation of prosody during sentence production. Psychological Review, 100:233–253.
E. 菲尔克 (1977)。语调在五种丹方言中滑动和记录。语言学201:5-59。
Filk, E. (1977). Tone glides and registers in five Dan dialects. Linguistics 201:5–59.
Fishman, YI, Volkov, IO, Noh, MD, Garell, PC, Bakken, H., Arezzo, JC, 等人。(2001)。和弦的和谐与不和谐:猴子和人类听觉皮层的神经元。神经生理学杂志, 86:271–278。
Fishman, Y. I., Volkov, I. O., Noh, M. D., Garell, P. C., Bakken, H., Arezzo, J. C., et al. (2001). Consonance and dissonance of musical chords: Neuronal in auditory cortex of monkeys and humans. Journal of Neurophysiology, 86:271–278.
惠誉,WT(2000 年)。言语的演变:比较回顾。认知科学趋势, 4:258–267。
Fitch, W. T. (2000). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4:258–267.
惠誉,WT(2006 年)。音乐的生物学和进化:比较视角。认知, 100:173-215。
Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition, 100:173–215.
Fitch, WT, & Giedd, J. (1999)。人类声道的形态学和发育:一项使用磁共振成像的研究。美国声学学会杂志, 106:1511–1522。
Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. Journal of the Acoustical Society of America, 106:1511–1522.
Fitch, WT, & Hauser, MD (2004)。非人灵长类动物句法处理的计算限制。科学, 303:377-380。
Fitch, W. T., & Hauser, M. D. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science, 303:377–380.
Floccia, C.、Nazzi, T. 和 Bertoncini, J. (2000)。新生儿对短刺激的陌生声音辨别。发展科学, 3:333–343。
Floccia, C., Nazzi, T., & Bertoncini, J. (2000). Unfamiliar voice discrimination for short stimuli in newborns. Developmental Science, 3:333–343.
JA 福多 (1983)。思维的模块化。马萨诸塞州剑桥市:麻省理工学院出版社。
Fodor, J. A. (1983). Modularity of Mind. Cambridge, MA: MIT Press.
Foris, DP (2000)。Sochiapan Chinantec 的语法。德克萨斯州达拉斯:SIL 国际。
Foris, D. P. (2000). A Grammar of Sochiapan Chinantec. Dallas, TX: SIL International.
Fougeron, C., & Jun, S.-A. (1998)。费率对法语语调的影响:韵律组织和语音实现。语音学杂志, 26:45–69。
Fougeron, C., & Jun, S.-A. (1998). Rate effects on French intonation: Prosodic organization and phonetic realization. Journal of Phonetics, 26:45–69.
加利福尼亚州福勒 (1986)。从直接现实主义的角度研究言语感知的事件方法。语音学杂志, 14:3-28。
Fowler, C. A. (1986). An event approach to the study of speech perception from a direct realist perspective. Journal of Phonetics, 14:3–28.
Fowler, CA、Brown, J.、Sabadini, L. 和 Weihing, J. (2003)。在感知中快速访问语音手势:来自选择和简单响应时间任务的证据。记忆和语言杂志。49、296–314。
Fowler, C. A., Brown, J., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory and Language. 49, 296–314.
Foxton, JM、Dean, JL、Gee, R.、Peretz, I. 和 Griffiths, TD (2004)。“音聋”背后的音调感知缺陷的特征。大脑, 127:801–810。
Foxton, J. M., Dean, J. L., Gee, R., Peretz, I., & Griffiths, T. D. (2004). Characterisation of deficits in pitch perception underlying “tone deafness.” Brain, 127:801–810.
Foxton, JM、Nandy, RK 和 Griffiths, TD (2006)。“音聋”中的节奏缺陷。大脑与认知, 62:24-29。
Foxton, J. M., Nandy, R. K., & Griffiths, T. D. (2006). Rhythm deficits in “tone-deafness.” Brain and Cognition, 62:24–29.
Foxton, JM、Talcott, JB、Witton, C.、Brace, H.、McIntyre, F. 和 Griffiths, TD (2003)。阅读技巧与全局而非局部的声学模式感知有关。自然神经科学, 6:343–4。
Foxton, J. M., Talcott, J. B., Witton, C., Brace, H., McIntyre, F., & Griffiths, T. D. (2003). Reading skills are related to global, but not local, acoustic pattern perception. Nature Neuroscience, 6:343–4.
Fraisse, P. (1982)。节奏和速度。载于:D. Deutsch(主编) ,音乐心理学(第 149-180 页)。纽约:学术出版社。
Fraisse, P. (1982). Rhythm and tempo. In: D. Deutsch (Ed.), The Psychology of Music (pp. 149–180). New York: Academic Press.
弗朗西丝,R. (1988)。音乐的感知(WJ Dowling, Trans.)。新泽西州希尔斯代尔:Erlbaum。
Frances, R. (1988). The Perception of Music (W. J. Dowling, Trans.). Hillsdale, NJ: Erlbaum.
Frances, R.、Lhermitte, F. 和 Verdy, M. (1973)。Le deficit musical des aphasiques。Revue Internationale de Psychologie Appliquee, 22:117-135。
Frances, R., Lhermitte, F., & Verdy, M. (1973). Le deficit musical des aphasiques. Revue Internationale de Psychologie Appliquee, 22:117–135.
Francis, AL、Ciocca, V. 和 Ng, BKC (2003)。关于词汇声调的(非)分类感知。感知与心理物理学, 65:1029-1044。
Francis, A. L., Ciocca, V., & Ng, B. K. C. (2003). On the (non)categorical perception of lexical tones. Perception and Psychophysics, 65:1029–1044.
Frankland, BW, & Cohen, AJ (2004)。旋律解析:Lerdahl 和 Jackendoff 的音调音乐生成理论的局部分组规则的量化和测试。音乐感知, 21:499-543。
Frankland, B. W., & Cohen, A. J. (2004). Parsing of melody: Quantification and testing of the local grouping rules of Lerdahl and Jackendoff’s A Generative Theory of Tonal Music. Music Perception, 21: 499–543.
Friberg, A., & Sundberg, J. (1999)。音乐表演是否暗示运动?最终 ritardandi 的模型源自对停止跑步者的测量。美国声学学会杂志, 105:1469–1484。
Friberg, A., & Sundberg, J. (1999). Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners. Journal of the Acoustical Society of America, 105:1469–1484.
Friederici, AD (2002)。迈向听觉句子处理的神经基础。认知科学趋势, 6:78–84。
Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences, 6:78–84.
Fries, W., & Swihart, AA (1990)。右半球损伤后节奏感障碍。神经心理学, 28:1317–1323。
Fries, W., & Swihart, A. A. (1990). Disturbance of rhythm sense following right hemisphere damage. Neuropsychologia, 28:1317–1323.
Fromkin, V.(主编)。(1978)。语气:语言调查。纽约:学术出版社。
Fromkin, V. (Ed.). (1978). Tone: A Linguistic Survey. New York: Academic Press.
Frota, S., & Vigario, M. (2001)。关于节奏差异的相关性:欧洲/巴西葡萄牙语案例。普罗布斯, 13:247-275。
Frota, S., & Vigario, M. (2001). On the correlates of rhythmic distinctions: The European/Brazilian Portuguese case. Probus, 13:247–275.
Fry, DB、Abramson, AS、Eimas, PD 和 Liberman, AM (1962)。合成元音的识别和辨别。语言和演讲, 5:171–189。
Fry, D. B., Abramson, A. S., Eimas, P. D., & Liberman, A. M. (1962). Identification and discrimination of synthetic vowels. Language and Speech, 5:171–189.
Fujioka, T.、Trainor, LJ、Ross, B.、Kakigi, R. 和 Pantev, C. (2004)。音乐训练增强了旋律轮廓和音程结构的自动编码。认知神经科学杂志, 16:1010–1021。
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience, 16:1010–1021.
Fussell, P. (1974)。仪表。载于:A. Priminger,(主编),普林斯顿诗歌与诗学百科全书(第 496-500 页)。新泽西州普林斯顿:普林斯顿大学出版社。
Fussell, P. (1974). Meter. In: A. Priminger, (Ed.), Princeton Encyclopedia of Poetry and Poetics (pp. 496–500). Princeton, NJ: Princeton University Press.
Fussell, P. (1979)。诗歌韵律和诗歌形式(修订版)。纽约:兰登书屋。
Fussell, P. (1979). Poetic Meter and Poetic Form (Rev. ed.). New York: Random House.
加布里埃尔 C. (1978)。Deryck Cooke 的音乐和意义理论的实验研究。音乐心理学, 9:44–53。
Gabriel, C. (1978). An experimental study of Deryck Cooke’s theory of music and meaning. Psychology of Music, 9:44–53.
加布里埃尔森,A.(1973 年)。听觉节律模式的相似性评级和维度分析。我和二。斯堪的纳维亚心理学杂志, 14:138–160,161–175。
Gabrielsson, A. (1973). Similarity ratings and dimension analyses of auditory rhythm patterns. I and II. Scandinavian Journal of Psychology, 14:138–160,161–175.
加布里埃尔森,A.(1993 年)。节奏的复杂性。在:T. Tighe & WJ Dowling(编辑),心理学和音乐:旋律和节奏的理解。新泽西州希尔斯代尔:Erlbaum。
Gabrielsson, A. (1993). The complexities of rhythm. In: T. Tighe & W. J. Dowling (Eds.), Psychology and Music: The Understanding of Melody and Rhythm. Hillsdale, NJ: Erlbaum.
Gabrielsson, A., & Juslin, PN (1996)。音乐表演中的情感表达:在表演者的意图和听者的体验之间。音乐心理学, 24:68–91。
Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance: Between the performer’s intention and the listener’s experience. Psychology of Music, 24:68–91.
Gabrielsson, A., & Lindstrom, E. (2001)。音乐结构对情绪表达的影响。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 223-248 页)。英国牛津:牛津大学出版社。
Gabrielsson, A., & Lindstrom, E. (2001). The influence of musical structure on emotional expression. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 223–248). Oxford, UK: Oxford University Press.
Gabrielsson, A., & Lindstrom Wik, S. (2003)。与音乐相关的强烈体验:描述性系统。科学音乐, 7:157–217。
Gabrielsson, A., & Lindstrom Wik, S. (2003). Strong experiences related to music: A descriptive system. Musicae Scientiae, 7:157–217.
Galizio, M., & Hendrick, C. (1972)。音乐伴奏对态度的影响:吉他作为说服的道具。应用社会心理学杂志, 2:350–359。
Galizio, M., & Hendrick, C. (1972). Effect of music accompaniment on attitude: The guitar as a prop for persuasion. Journal of Applied Social Psychology, 2:350–359.
Galves, A.、Garcia, J.、Duarte, D. 和 Galves, C. (2002)。响度作为有节奏的阶级歧视的基础。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Galves, A., Garcia, J., Duarte, D., & Galves, C. (2002). Sonority as a basis for rhythmic class discrimination. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Gandour, J.、Wong, D.、Hsieh, L.、Weinzapfel, B.、Van Lancker, D. 和 Hutchins, G. (2000)。声调感知的跨语言 PET 研究。认知神经科学杂志, 12:207-222。
Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., & Hutchins, G. (2000). A crosslinguistic PET study of tone perception. Journal of Cognitive Neuroscience, 12:207–222.
加菲亚斯 R. (1987)。关于语言和音乐习得过程的思考。载于:FR Wilson 和 FL Roehmann(编辑),音乐制作生物学:音乐与儿童发展(第 100-105 页)。密苏里州圣路易斯:MMB 音乐。
Garfias, R. (1987). Thoughts on the process of language and music acquisition. In: F. R. Wilson & F. L. Roehmann (Eds.), The Biology of Music Making: Music and Child Development (pp. 100–105). St. Louis, MO: MMB Music.
Gaser, C., & Schlaug, G. (2003)。音乐家和非音乐家的大脑结构不同。神经科学杂志, 23:9240–9245。
Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience, 23:9240–9245.
Gee, JP, & Grosjean, F. (1983)。性能结构:心理语言学和语言学评估。认知心理学, 15:411–458。
Gee, J. P., & Grosjean, F. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411–458.
盖斯曼,T. (2000)。从进化的角度看长臂猿歌曲和人类音乐。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 102-123 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Geissmann, T. (2000). Gibbon songs and human music from an evolutionary perspective. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 102–123). Cambridge, MA: MIT Press.
Genter, D., & Goldin-Meadow, S.(编辑)。(2003)。头脑中的语言。马萨诸塞州剑桥市:麻省理工学院出版社。
Genter, D., & Goldin-Meadow, S. (Eds.). (2003). Language in Mind. Cambridge, MA: MIT Press.
Gentner, T.、Fenn, KM、Margoliash, D. 和 Nusbaum, HC (2006)。鸣禽的递归句法模式学习。自然, 440:1204–1207。
Gentner, T., Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440:1204–1207.
Gentner, TQ, & Hulse, SH (1998)。欧洲八哥(Sturnus vulgaris) 个体识别的感知机制。动物行为, 56:579–594。
Gentner, T. Q., & Hulse, S. H. (1998). Perceptual mechanisms for individual recognition in European starlings (Sturnus vulgaris). Animal Behaviour, 56:579–594.
George, MS, Parekh, PI, Rosinksy, N., Ketter, T., Kimbrall, TA, 等人。(1996)。理解情绪韵律会激活右半球区域。神经病学档案, 53:665–670。
George, M. S., Parekh, P. I., Rosinksy, N., Ketter, T., Kimbrall, T. A., et al. (1996). Understanding emotional prosody activates right hemisphere regions. Archives of Neurology, 53:665–670.
Gerardi, GM, & Gerken, L. (1995)。对情态和旋律轮廓的情感反应的发展。音乐感知, 12:279-290。
Gerardi, G. M., & Gerken, L. (1995). The development of affective response to modality and melodic contour. Music Perception, 12:279–290.
Gerhardt, HC, & Huber, F. (2002)。昆虫和 Anurans 的声学交流。芝加哥:芝加哥大学出版社。
Gerhardt, H. C., & Huber, F. (2002). Acoustic Communication in Insects and Anurans. Chicago: University of Chicago Press.
吉布森,E. (1998)。语言复杂性:语法依赖的局部性。认知, 68:1-76。
Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68:1–76.
吉布森,E. (2000)。依赖局部性理论:一种基于距离的语言复杂性理论。载于:A. Marantaz、Y. Miyashita 和 W. O'Neil(编),图像、语言、大脑(第 95-126 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Gibson, E. (2000). The dependency locality theory: A distance-based theory of linguistic complexity. In: A. Marantaz, Y. Miyashita, & W. O’Neil (Eds.), Image, Language, Brain (pp. 95–126). Cambridge, MA: MIT Press.
JJ 吉布森 (1979)。视觉感知的生态方法。波士顿:霍顿米夫林。
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
吉丁斯 R. (1984)。音乐名言和轶事。Burnt Mill,英国:Longman。
Giddings, R. (1984). Musical Quotes and Anecdotes. Burnt Mill, UK: Longman.
Giguere, J.-F.、Dalla Bella, S. 和 Peretz, I. (2005)。先天性失乐症的歌唱能力。海报于 4 月 10 日至 12 日在纽约举行的认知神经科学学会会议上发表。
Giguere, J.-F., Dalla Bella, S., & Peretz, I. (2005). Singing Abilities in Congenital Amusia. Poster presented at the Cognitive Neuroscience Society meeting, New York, April 10–12.
Giles, H.、Coupland, N. 和 Coupland, J.(编辑)。(1991)。住宿语境:应用社会语言学的发展。英国剑桥:剑桥大学出版社。
Giles, H., Coupland, N., & Coupland, J. (Eds.). (1991). Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge, UK: Cambridge University Press.
Gitschier, J.、Athos, A.、Levinson, B.、Zemansky, J.、Kistler, A. 和 Freimer, N. (2004)。绝对音高:遗传学和感知。在:SD Lipscomb 等人。(编辑),第八届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿(第 351-352 页)。澳大利亚阿德莱德:Causal Productions。
Gitschier, J., Athos, A., Levinson, B., Zemansky, J., Kistler, A., & Freimer, N. (2004). Absolute pitch: Genetics and perception. In: S. D. Lipscomb et al. (Ed.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL (pp. 351–352). Adelaide, Australia: Causal Productions.
RO 吉尔丁根 (2007)。Galant 风格的音乐。纽约:牛津大学出版社。
Gjerdingen, R. O. (2007). Music in the Galant Style. New York: Oxford University Press.
Goldin-Meadow, S. (1982)。递归的弹性:在没有传统语言模型的情况下开发的通信系统的研究。载于:E. Wanner 和 LR Gleitman(编辑),语言习得:最先进的技术(第 51-77 页)。纽约:剑桥大学出版社。
Goldin-Meadow, S. (1982). The resilience of recursion: A study of a communication system developed without a conventional language model. In: E. Wanner & L. R. Gleitman (Eds.), Language Acquisition: The State of the Art (pp. 51–77). New York: Cambridge University Press.
A. 戈德斯坦 (1980)。对音乐和其他刺激的反应而兴奋。生理心理学, 8:126–129。
Goldstein, A. (1980). Thrills in response to music and other stimuli. Physiological Psychology, 8:126–129.
Goldstein, M.、King, A. 和 West, M. (2003)。社会互动塑造牙牙学语:测试鸟鸣和言语之间的相似之处。美国国家科学院院刊, 100:8030–8035。
Goldstein, M., King, A., & West, M. (2003). Social interaction shapes babbling: Testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences, USA, 100:8030–8035.
Gopnik, M. (1990)。特征盲语法和言语障碍。自然, 244:715。Gopnik, M., & Crago, MB (1991)。发育性语言障碍的家族聚集。认知, 39:1-50。
Gopnik, M. (1990). Feature-blind grammar and dysphasia. Nature, 244:715. Gopnik, M., & Crago, M. B. (1991). Familial aggregation of a developmental language disorder. Cognition, 39:1–50.
戈登,JW(1987)。音乐音调的感知起音时间。美国声学学会杂志, 82:88–105。
Gordon, J. W. (1987). The perceptual attack time of musical tones. Journal of the Acoustical Society of America, 82:88–105.
Gordon, PC、Hendrick, R. 和 Johnson, M. (2001)。语言处理过程中的记忆干扰。实验心理学杂志:学习、记忆和认知, 27:1411–1423。
Gordon, P. C., Hendrick, R., & Johnson, M. (2001). Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory and Cognition, 27:1411–1423.
后藤 H. (1971)。普通日本成年人对声音“L”和“R”的听觉感知。神经心理学, 9:317–327。
Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “L” and “R.” Neuropsychologia, 9:317–327.
Gouvea, A.、Phillips, C.、Kazanina, N. 和 Poeppel, D.(提交)。P600 背后的语言过程。
Gouvea, A., Phillips, C., Kazanina, N., & Poeppel, D. (submitted). The Linguistic Processes Underlying the P600.
Grabe, E. (2002)。变化增加了韵律类型学。载于:B. Bell & I. Marlien(编),Proceedings of Speech Prosody,普罗旺斯地区艾克斯(第 127-132 页)。法国普罗旺斯地区艾克斯:Parole et Langage。
Grabe, E. (2002). Variation adds to prosodic typology. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence (pp. 127–132). Aix-en-Provence, France: Laboratoire Parole et Langage.
Grabe, E.、Gussenhoven, C.、Haan, J.、Post, B. 和 Marsi, E. (1997)。荷兰语中的前重音音高和说话者态度。语言和演讲, 41:63–85。
Grabe, E., Gussenhoven, C., Haan, J., Post, B., & Marsi, E. (1997). Pre-accentual pitch and speaker attitudes in Dutch. Language and Speech, 41:63–85.
Grabe, E., & Low, EL (2002)。语音的持续变化和节奏类假设。在 C. Gussenhoven & N. Warner(编辑),实验室音系学 7(第 515-546 页)中。德国柏林:Mouton de Gruyter。
Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515–546). Berlin, Germany: Mouton de Gruyter.
Grabe, E.、Post, B. 和 Watson, I. (1999)。英语和法语节奏模式的习得。第 14 届国际语音科学大会论文集,旧金山(第 1201-1204 页)。
Grabe, E., Post, B., & Watson, I. (1999). The acquisition of rhythmic patterns in English and French. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco (pp. 1201–1204).
Grabe, E., & Warren, P. (1995)。重音转移:是演讲者使用还是听众使用?在:B. Connell & A. Arvaniti(编辑),实验室音韵学论文 IV。音系学和语音证据(第 95-110 页)。英国剑桥:剑桥大学出版社。
Grabe, E., & Warren, P. (1995). Stress shift: Do speakers do it or do listeners use it? In: B. Connell & A. Arvaniti (Eds.), Papers in Laboratory Phonology IV. Phonology and Phonetic Evidence (pp. 95–110). Cambridge, UK: Cambridge University Press.
Grahn, JA, & Brett, M. (2007)。大脑运动区域的节奏和节拍感知。认知神经科学杂志, 19:893-906。
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19:893–906.
S. 格林伯格 (1996)。总机转录项目。见:研究报告♯ 24,1996年大词汇量连续语音识别暑期工作坊技术报告系列。约翰霍普金斯大学语言和语音处理中心,马里兰州巴尔的摩。
Greenberg, S. (1996). The switchboard transcription project. In: Research Report ♯24, 1996 Large Vocabulary Continuous Speech Recognition Summer Workshop Technical Report Series. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD.
S. 格林伯格 (2006)。用于理解口语的多层框架。载于:S. Greenberg 和 WA Ainsworth(编辑),聆听演讲:听觉视角(第 411-433 页)。新泽西州 Mahwah:Erlbaum。
Greenberg, S. (2006). A multi-tier framework for understanding spoken language. In: S. Greenberg & W. A. Ainsworth (Eds.), Listening to Speech: An Auditory Perspective (pp. 411–433). Mahwah, NJ: Erlbaum.
马里兰州格林菲尔德 (2005)。节肢动物和无尾目动物公共性表现的机制和进化。行为研究进展, 35:1-62。
Greenfield, M. D. (2005). Mechanisms and evolution of communal sexual displays in arthropods and anurans. Advances in the Study of Behavior, 35: 1–62.
Greenfield, MD、Tourtellot, MK 和 Snedden, WA (1997)。优先效应和合唱的演变。伦敦皇家学会会刊 B, 264:1355–1361。
Greenfield, M. D., Tourtellot, M. K., & Snedden, W. A. (1997). Precedence effects and the evolution of chorusing. Proceedings of the Royal Society of London B, 264:1355–1361.
格林斯潘,R. (1995)。了解行为的遗传结构。科学美国人, 272:72-78。
Greenspan, R. (1995). Understanding the genetic construction of behavior. Scientific American, 272:72–78.
格林斯潘,R. (2004)。E pluribus unum, ex uno plura:行为研究的定量和单基因观点。神经科学年度回顾, 27:79–105。
Greenspan, R. (2004). E pluribus unum, ex uno plura: Quantitative- and single-gene perspectives on the study of behavior. Annual Review of Neuroscience, 27:79–105.
Greenspan, RJ, & Tully, T. (1994)。小组报告:基因如何设置行为?载于:RJ Greenspan 和 CP Kyriacou(编辑),行为系统中的灵活性和约束(第 65-80 页)。奇切斯特:John Wiley & Sons。
Greenspan, R. J., & Tully, T. (1994). Group report: How do genes set up behavior? In: R. J. Greenspan & C. P. Kyriacou (Eds.), Flexibility and Constraint in Behavioral Systems (pp. 65–80). Chichester: John Wiley & Sons.
Gregersen, PK、Kowalsky, E.、Kohn, N. 和 Marvin, EW (1999)。绝对音高:普遍性、种族差异和遗传成分的估计。美国医学遗传学杂志, 65:911–913。
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: Prevalence, ethnic variation, and estimation of the genetic component. American Journal of Medical Genetics, 65:911–913.
Gregersen, PK、Kowalsky, E.、Kohn, N. 和 Marvin, EW (2000)。幼儿音乐教育和绝对音高倾向:梳理基因和环境。美国医学遗传学杂志, 98:280–282。
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2000). Early childhood music education and predisposition to absolute pitch: Teasing apart genes and environment. American Journal of Medical Genetics, 98:280–282.
格雷戈里,AH(1995 年)。对瓦格纳主旋律的感知和识别。在加州大学伯克利分校音乐感知和认知协会会议上发表的论文。
Gregory, A. H. (1995). Perception and identification of Wagner’s Leitmotifs. Paper presented at the Society for Music Perception and Cognition conference, University of California, Berkeley.
AH 格雷戈里 (1997)。音乐在社会中的作用:民族音乐学视角。在:DJ Hargreaves & AC North(编辑),音乐的社会心理学(第 123-140 页)。英国牛津:牛津大学出版社。
Gregory, A. H. (1997). The roles of music in society: The ethnomusicological perspective. In: D. J. Hargreaves & A. C. North (Eds.), The Social Psychology of Music (pp. 123–140). Oxford, UK: Oxford University Press.
Gregory, AH, & Varney, N. (1996)。对音乐的情感反应的跨文化比较。音乐心理学, 24:47–52。
Gregory, A. H., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music. Psychology of Music, 24:47–52.
格雷格,J. (2003)。语言的音乐:西班牙语、法语、俄语和英语的音乐和语言的节奏特性比较。未发表的学士论文,伦敦市政厅音乐与戏剧学院。
Greig, J. (2003). The Music of Language: A Comparison of the Rhythmic Properties of Music and Language in Spanish, French, Russian and English. Unpublished bachelor’s thesis, Guildhall School of Music and Drama, London.
格雷,J. (1977)。音乐音色的多维感知缩放。美国声学学会杂志, 61:1270–1277。
Grey, J. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 61:1270–1277.
TD 格里菲斯 (2002)。中枢听觉处理障碍。神经生物学的最新观点, 15:31–33。
Griffiths, T. D. (2002). Central auditory processing disorders. Current Opinion in Neurobiology, 15:31–33.
Griffiths, TD、Rees, A.、Witton, C.、Cross, PM、Shakir, RA 和 Green, GGR (1997)。右半球梗死后的时空听觉处理缺陷:一项心理物理学研究。大脑, 120:785–794。
Griffiths, T. D., Rees, A., Witton, C., Cross, P. M., Shakir, R. A., & Green, G. G. R. (1997). Spatial and temporal auditory processing deficits following right hemisphere infarction: A psychophysical study. Brain, 120:785–794.
Grodner, DJ, & Gibson, E. (2005)。语言输入的连续性对句子复杂性的影响。认知科学,29, 261–291。
Grodner, D. J., & Gibson, E. (2005). Consequences of the serial nature of linguistic input for sentential complexity. Cognitive Science,29, 261–291.
Gross, H.(主编)。(1979)。诗歌的结构(第 2 版)。纽约:ECCO 出版社。
Gross, H. (Ed.). (1979). The Structure of Verse (2nd ed.). New York: Ecco Press.
Grout、DJ 和 Palisca,CV (2000)。西方音乐史(第 6 版)。纽约:WW 诺顿。
Grout, D. J., & Palisca, C. V. (2000). A History of Western Music (6th ed.). New York: W. W. Norton.
Grover, C.、Jamieson, DG 和 Dobrovolsky, MB (1987)。英语、法语和德语的语调:感知和产生。语言和演讲, 30:277–296。
Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French, and German: Perception and production. Language and Speech, 30:277–296.
Guenther, FH (2000)。分析错误会使感知磁效应的“去极化”无效。美国声学学会杂志, 107:3576–3580。
Guenther, F. H. (2000). An analytical error invalidates the “depolarization” of the perceptual magnet effect. Journal of the Acoustical Society of America, 107:3576–3580.
Guenther, FH, & Gjaja, MN (1996)。作为神经映射形成的新兴特性的感知磁体效应。美国声学学会杂志, 100:1111–1121。
Guenther, F. H., & Gjaja, M. N. (1996). The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America, 100:1111–1121.
Guenther, FH、Husain, FT、Cohen, MA 和 Shinn-Cunningham, BG (1999)。听觉分类和辨别训练对听觉知觉空间的影响。美国声学学会杂志, 106:2900–2912。
Guenther, F. H., Husain, F. T., Cohen, M. A., & Shinn-Cunningham, B. G. (1999). Effects of auditory categorization and discrimination training on auditory perceptual space. Journal of the Acoustical Society of America, 106:2900–2912.
Guenther, FH、Nieto-Castanon, A.、Ghosh, SS 和 Tourville, JA (2004)。听觉皮层图中声音类别的表示。言语、语言和听力研究杂志, 47:46–57。
Guenther, F. H., Nieto-Castanon, A., Ghosh, S. S., & Tourville, J. A. (2004). Representation of sound categories in auditory cortical maps. Journal of Speech, Language, and Hearing Research, 47:46–57.
Gunter, TC、Friederici, AD 和 Schriefers, H. (2000)。句法性别和语义期望:ERPs 揭示了早期的自主性和晚期的互动。认知神经科学杂志, 12:556–568。
Gunter, T. C., Friederici, A. D., & Schriefers, H. (2000). Syntactic gender and semantic expectancy: ERPs reveal early autonomy and late interaction. Journal of Cognitive Neuroscience, 12:556–568.
Gussenhoven, C., & Rietveld, ACM (1991)。两种核音分类的实验评估。语言学, 29:423-449。
Gussenhoven, C., & Rietveld, A. C. M. (1991). An experimental evaluation of two nuclear tone taxonomies. Linguistics, 29:423–449.
Gussenhoven, C., & Rietveld, ACM (1992)。语调轮廓、韵律结构和边界延长。语音学杂志, 20:283–303。
Gussenhoven, C., & Rietveld, A. C. M. (1992). Intonation contours, prosodic structure and preboundary lengthening. Journal of Phonetics, 20:283–303.
美国古特 (2005)。尼日利亚英语韵律。全球英语, 26:153–177。
Gut, U. (2005). Nigerian English prosody. English World-Wide, 26:153–177.
Haarmann, HJ, & Kolk, HHJ (1991)。布罗卡失语症的句法启动:缓慢激活的证据。失语症, 5:247–263。
Haarmann, H. J., & Kolk, H. H. J. (1991). Syntactic priming in Broca’s aphasics: Evidence for slow activation. Aphasiology, 5:247–263.
Hacohen, R., & Wagner, N. (1997)。瓦格纳主题的交际力:内涵与外延的互补关系。音乐感知, 14:445–476。
Hacohen, R., & Wagner, N. (1997). The communicative force of Wagner’s leitmotifs: Complementary relationships between their connotations and denotations. Music Perception, 14:445–476.
Haeberli, J. (1979)。十二个纳斯卡排箫:一项研究。民族音乐学, 23:57–74。
Haeberli, J. (1979). Twelve Nasca panpipes: A study. Ethnomusicology, 23:57–74.
Haesler, S.、Wada, K.、Nshdejan, A.、Morrisey, EE、Lints, T.、Jarvis, ED 和 Scharff, C.(2004 年)。FoxP2 在鸟类声乐学习者和非学习者中的表达。神经科学杂志, 24:3164–75。
Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D., & Scharff, C. (2004). FoxP2 expression in avian vocal learners and non-learners. Journal of Neuroscience, 24:3164–75.
Hagoort, P.、Brown, CM 和 Goothusen, J. (1993)。句法正移 (SPS) 作为句法处理的 ERP 度量。语言和认知过程, 8:439–483。
Hagoort, P., Brown, C. M., & Groothusen, J. (1993). The syntactic positive shift (SPS) as an ERP measure of syntactic processing. Language and Cognitive Processes, 8:439–483.
Hagoort, P.、Brown, CM 和 Osterhout, L. (1999)。句法处理的神经认知。在:CM Brown & P. Hagoort(编辑),语言的神经认知(第 273-316 页)。英国牛津:牛津大学出版社。
Hagoort, P., Brown, C. M., & Osterhout, L. (1999). The neurocognition of syntactic processing. In: C. M. Brown & P. Hagoort (Eds.), The Neurocognition of Language (pp. 273–316). Oxford, UK: Oxford University Press.
Hajda, JA、Kendall, RA、Carterette, EC 和 Harshberger, ML (1997)。音色研究中的方法论问题。在:I. Deliege & JA Sloboda(编辑),音乐的感知和认知(第 253-306 页)。英国霍夫:心理学出版社。
Hajda, J. A., Kendall, R. A., Carterette, E. C., & Harshberger, M. L. (1997). Methodological issues in timbre research. In: I. Deliege & J. A. Sloboda (Eds.), Perception and Cognition of Music (pp. 253–306). Hove, UK: Psychology Press.
黑尔,J. (2001)。作为心理语言学模型的概率 Earley 解析器。NAACL 会议记录, 2:159–166。
Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. Proceedings of NAACL, 2:159–166.
Hall, MD, & Pastore, RE (1992)。音乐双重感知:感知具有潜意识区别音调的形象化的好和弦。实验心理学杂志:人类感知和表现, 18:752–762。
Hall, M. D., & Pastore, R. E. (1992). Musical duplex perception: Perception of figurally good chords with subliminal distinguishing tones. Journal of Experimental Psychology: Human Perception and Performance, 18:752–762.
Hall, RA, Jr.(1953 年 6 月)。埃尔加与英式英语的语调。留声机,第 6-7 页。转载于 D. Bolinger(主编)。(1972)。语调:选读(第 282–285 页)。Harmondsworth:企鹅。
Hall, R. A., Jr. (1953, June). Elgar and the intonation of British English. The Gramophone, pp. 6–7. Reprinted in D. Bolinger (Ed.). (1972). Intonation: Selected Readings (pp. 282–285). Harmondsworth: Penguin.
Halle, M., & Idsardi, W. (1996)。应力和度量结构的一般特性。载于:J. Goldsmith(主编) ,语音理论手册(第 403-443 页)。马萨诸塞州剑桥市:布莱克威尔。
Halle, M., & Idsardi, W. (1996). General properties of stress and metrical structure. In: J. Goldsmith (Ed.), The Handbook of Phonological Theory (pp. 403–443). Cambridge, MA: Blackwell.
Halle, M., & Vergnaud, J.-R. (1987)。关于压力的论文。马萨诸塞州剑桥市:麻省理工学院出版社。
Halle, M., & Vergnaud, J.-R. (1987). An Essay on Stress. Cambridge, MA: MIT Press.
哈利迪,麦 (1970)。英语口语课程:语调。伦敦:牛津大学出版社。
Halliday, M. A. K. (1970). A Course in Spoken English: Intonation. London: Oxford University Press.
亨德尔,S. (1989)。听力:听觉事件感知简介。马萨诸塞州剑桥市:麻省理工学院出版社。
Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA: MIT Press.
Hannon, EE, & Johnson, SP (2005)。婴儿使用仪表对节奏和旋律进行分类:对音乐结构学习的启示。认知心理学, 50, 354–377。
Hannon, E. E., & Johnson, S. P. (2005). Infants use meter to categorize rhythms and melodies: Implications for musical structure learning. Cognitive Psychology, 50, 354–377.
Hannon, EE、Snyder, JS、Eerola, T. 和 Krumhansl, CL (2004)。旋律和时间线索在感知音乐节拍中的作用。实验心理学杂志:人类感知和表现, 30:956–974。
Hannon, E. E., Snyder, J. S., Eerola, T., & Krumhansl, C. L. (2004). The role of melodic and temporal cues in perceiving musical meter. Journal of Experimental Psychology: Human Perception and Performance, 30:956–974.
Hannon, EE, & Trehub, SE (2005)。婴儿期和成年期的格律类别。心理科学, 16:48-55。
Hannon, E. E., & Trehub, S. E. (2005). Metrical categories in infancy and adulthood. Psychological Science, 16:48–55.
E. 汉斯里克 (1854/1957)。音乐中的美丽(G. Cohen, Trans., 1885, 7th ed.)。纽约:文科出版社。
Hanslick, E. (1854/1957). The Beautiful in Music (G. Cohen, Trans., 1885, 7th ed.). New York: Liberal Arts Press.
Harris, MS, & Umeda, N. (1987)。句子中基频轮廓的差异 limen。美国声学学会杂志, 81:1139–1145。
Harris, M. S., & Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences. Journal of the Acoustical Society of America, 81:1139–1145.
Hart, B., & Risley, T. (1995)。美国儿童日常经历的重大差异。巴尔的摩:PH Brooks。
Hart, B., & Risley, T. (1995). Meaningful Differences in the Everyday Experiences of American Children. Baltimore: P. H. Brooks.
Haspelmath, M., Dryer, MW, Gil, D., & Comrie, B. (2005)。世界语言结构图集。纽约:牛津大学出版社。
Haspelmath, M., Dryer, M. W., Gil, D., & Comrie, B. (2005). The World Atlas of Language Structures. New York: Oxford University Press.
Hast, DE、Cowdery, JR 和 Scott, S.(编辑)。(1999)。探索音乐世界。爱荷华州迪比克:肯德尔亨特。(引自视频节目《 ♯ 6:旋律》中对西蒙·沙欣的采访。)
Hast, D. E., Cowdery, J. R., & Scott, S. (Eds.). (1999). Exploring the World of Music. Dubuque, IA: Kendall Hunt. (Quote from interview with Simon Shaheen in video program ♯6: Melody.)
Hatten, R. (2004)。解释音乐手势、主题和比喻:莫扎特、贝多芬、舒伯特。布卢明顿:印第安纳大学出版社。
Hatten, R. (2004). Interpreting Musical Gestures, Topics, and Tropes: Mozart, Beethoven, Schubert. Bloomington: Indiana University Press.
Hauser, MD、Chomsky, N. 和 Fitch, WT (2002)。语言能力:它是什么,谁拥有它,它是如何演变的?科学, 298:1569–1579。
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298:1569–1579.
Hauser, MD, & Fowler, CA (1992)。基频偏角并非人类语言所独有:来自非人类灵长类动物的证据。美国声学学会杂志, 91:363–369。
Hauser, M. D., & Fowler, C. A. (1992). Fundamental frequency declination is not unique to human speech: Evidence from nonhuman primates. Journal of the Acoustical Society of America, 91:363–369.
Hauser, MD, & McDermott, J. (2003)。音乐学院的演变:一个比较视角。自然神经科学, 6:663–668。
Hauser, M. D., & McDermott, J. (2003). The evolution of the music faculty: A comparative perspective. Nature Neuroscience, 6:663–668.
Hauser, MD, Newport, EL, & Aslin, RN (2001)。非人类灵长类动物语音流的分割:棉顶狨猴的统计学习。认知, 78:B53-B64。
Hauser, M. D., Newport, E. L., & Aslin, R. N. (2001). Segmentation of the speech stream in a nonhuman primate: Statistical learning in cotton-top tamarins. Cognition, 78:B53-B64.
Hawkins, S., & Barrett-Jones, S.(准备中)。关于语音和音素类别:感知磁铁效应的实验和理论评估。
Hawkins, S., & Barrett-Jones, S. (in preparation). On phonetic and phonemic categories: An experimental and theoretical appraisal of the perceptual magnet effect.
Hay, JSF, & Diehl, RL (2007)。节奏分组的感知:测试 iambic/trochaic 法则。感知与心理物理学, 69:113-122。
Hay, J. S. F., & Diehl, R. L. (2007). Perception of rhythmic grouping: Testing the iambic/trochaic law. Perception and Psychophysics, 69:113–122.
海斯 B. (1989)。以米为单位的韵律等级。在:P. Kiparsky & G. Youmans (Eds.), Phonetics and Phonology, Vol. 1:节奏与节拍(第 201–260 页)。加利福尼亚州圣地亚哥:学术出版社。
Hayes, B. (1989). The prosodic hierarchy in meter. In: P. Kiparsky & G. Youmans (Eds.), Phonetics and Phonology, Vol. 1: Rhythm and Meter (pp. 201–260). San Diego, CA: Academic Press.
Hayes, B. (1995a)。两首日本儿歌。未发表的手稿。
Hayes, B. (1995a). Two Japanese children’s songs. Unpublished manuscript.
海斯 B. (1995b)。韵律重音理论:原理和案例研究。芝加哥:芝加哥大学出版社。
Hayes, B. (1995b). Metrical Stress Theory: Principle and Case Studies. Chicago: University of Chicago Press.
Heaton, P.、Allen, R.、Williams, K.、Cummins, O. 和 Happe, F.(出版中)。社交和认知缺陷会削弱对音乐的理解吗?来自自闭症和唐氏综合症的证据。英国发展心理学杂志。
Heaton, P., Allen, R., Williams, K., Cummins, O., & Happe, F. (in press). Do social and cognitive deficits curtail musical understanding? Evidence from Autism and Down syndrome. British Journal of Developmental Psychology.
Hebert, S., & Cuddy, LL (2006)。音乐阅读缺陷和大脑。认知心理学进展, 2:199–206。
Hebert, S., & Cuddy, L. L. (2006). Music-reading deficiencies and the brain. Advances in Cognitive Psychology, 2:199–206.
亥姆霍兹,H.冯。(1954)。关于音调的感觉作为音乐理论的生理学基础(第 2 版,AJ Ellis,Trans.)。纽约:多佛。(原作发表于 1885 年)
Helmholtz, H. von. (1954). On the Sensations of Tone as a Physiological Basis for the Theory of Music (2nd ed., A. J. Ellis, Trans.). New York: Dover. (Original work published 1885)
Henshilwood, C.、d'Errico, F.、Vanhaeren, M.、van Niekerk, K. 和 Jacobs, Z. (2004)。来自南非的中石器时代贝壳珠。科学, 304:404。
Henshilwood, C., d’Errico, F., Vanhaeren, M., van Niekerk, K., & Jacobs, Z. (2004). Middle Stone Age shell beads from South Africa. Science, 304:404.
Henthorn, T., & Deutsch, D. (2007)。种族与环境:Peter K. Gregersen、Elena Kowalsky、Nina Kohn 和 Elizabeth West Marvin 对“早期儿童音乐教育和对绝对音高的倾向:挑逗基因和环境”的评论 [2000]。美国医学遗传学杂志A 部分,143A:102–103。
Henthorn, T., & Deutsch, D. (2007). Ethnicity versus environment: Comment on ‘Early childhood music education and predispositions to abosolute pitch: Teasing apart genes and environment’ by Peter K. Gregersen, Elena Kowalsky, Nina Kohn, and Elizabeth West Marvin [2000]. American Journal of Medical Genetics Part A,143A:102–103.
PG 赫珀 (1991)。胎儿出生前后的学习检查。爱尔兰心理学杂志, 12:95–107。
Hepper, P. G. (1991). An examination of fetal learning before and after birth. The Irish Journal of Psychology, 12:95–107.
Herman, LM, & Uyeyama, RK (1999)。海豚的语法能力:对 Kako (1999) 的评论。动物学习和行为, 27:18–23。
Herman, L. M., & Uyeyama, R. K. (1999). The dolphin’s grammatical competency: Comments on Kako (1999). Animal Learning and Behavior, 27:18–23.
Hermerén, G. (1988)。表现、真理和艺术语言。载于:V. Rantala、L. Rowell 和 E. Tarasti(编),音乐哲学论文集,Acta Philosophica Fennica(第 43 卷,第 179-209 页)。赫尔辛基:芬兰哲学学会。
Hermerén, G. (1988). Representation, truth, and the languages of the arts. In: V. Rantala, L. Rowell, & E. Tarasti (Eds.), Essays on the Philosophy of Music, Acta Philosophica Fennica (Vol. 43, pp. 179–209). Helsinki: The Philosophical Society of Finland.
Hermes, D. (2006)。音高轮廓的程式化。在:S. Sudhoff 等人。(编辑),实证韵律研究方法(第 29-61 页)。柏林:Walter de Gruyter。
Hermes, D. (2006). Stylization of pitch contours. In: S. Sudhoff et al. (Eds.), Methods in Empirical Prosody Research (pp. 29–61). Berlin: Walter de Gruyter.
Hermes, D., & van Gestel, JC (1991)。语音语调的频率范围。美国声学学会杂志, 90:97–102。
Hermes, D., & van Gestel, J. C. (1991). The frequency scale of speech intonation. Journal of the Acoustical Society of America, 90:97–102.
G. 赫尔佐格 (1926)。Helen H. Roberts & Diamond Jenness,“铜爱斯基摩人之歌”(书评)。美国民俗杂志, 39:218-225。
Herzog, G. (1926). Helen H. Roberts & Diamond Jenness, “Songs of the Copper Eskimo” (book review). Journal of American Folklore, 39:218–225.
G. 赫尔佐格 (1934)。语音旋律和原始音乐。音乐剧季刊, 20:452–466。
Herzog, G. (1934). Speech-melody and primitive music. The Musical Quarterly, 20:452–466.
赫尔佐格,G. (1945)。西非部落的鼓声。字, 1:217–238。K. 赫夫纳 (1936)。音乐表现元素的实验研究。美国心理学杂志, 48:246–268。
Herzog, G. (1945). Drum-signaling in a West African tribe. Word, 1:217–238. Hevner, K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48:246–268.
K. 赫夫纳 (1937)。音乐中音调和节奏的情感价值。美国心理学杂志, 49:621–630。
Hevner, K. (1937). The affective value of pitch and tempo in music. American Journal of Psychology, 49:621–630.
Hickok, G., & Poeppel, D. (2004)。背侧和腹侧流:理解语言功能解剖学方面的框架。认知, 92:67-99。
Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition, 92:67–99.
L. 欣顿 (1984)。Havasupai 歌曲:语言学视角。德国图宾根:Gunter Narr Verlag。
Hinton, L. (1984). Havasupai Songs: A Linguistic Perspective. Tubingen, Germany: Gunter Narr Verlag.
Hinton, L.、Nichols, J. 和 Ohala, JJ(编辑)。(1994)。声音象征主义。英国剑桥:剑桥大学出版社。
Hinton, L., Nichols, J., & Ohala, J. J. (Eds.). (1994). Sound Symbolism. Cambridge, UK: Cambridge University Press.
Hirsh-Pasek, K.、Kemler Nelson, DG、Jusczyk, PW、Cassidy, KW、Druss, B. 和 Kennedy, L. (1987)。从句是小婴儿的感知单位。认知, 26:269-286。
Hirsh-Pasek, K., Kemler Nelson, D. G., Jusczyk, P. W., Cassidy, K. W., Druss, B., & Kennedy, L. (1987). Clauses are perceptual units for young infants. Cognition, 26:269–286.
Hirst, D. 和 Di Cristo, A.(编辑)。(1998)。语调系统:二十种语言的调查。英国剑桥:剑桥大学出版社。
Hirst, D., & Di Cristo, A. (Eds.). (1998). Intonation Systems: A Survey of Twenty Languages. Cambridge, UK: Cambridge University Press.
JR 霍布斯 (1985)。关于话语的连贯性和结构。CSLI 技术报告 85-37。加利福尼亚州斯坦福:CSLI。
Hobbs, J. R. (1985). On the Coherence and Structure of Discourse. CSLI Technical Report 85–37. Stanford, CA: CSLI.
JR 霍布斯 (1990)。文学与认知。CLSI 讲座笔记 21:加利福尼亚州斯坦福:CSLI。
Hobbs, J. R. (1990). Literature and Cognition. CLSI Lecture Notes 21: Stanford, CA: CSLI.
加利福尼亚州霍基特和 S. 阿尔特曼 (1968)。关于设计特点的注释。载于:TA Sebeok(主编),动物交流:研究技术和研究结果(第 61-72 页)。布卢明顿:印第安纳大学出版社。
Hockett, C. A., & Altmann, S. (1968). A note on design features. In: T. A. Sebeok (Ed.), Animal Communication: Techniques of Study and Results of Research (pp. 61–72). Bloomington: Indiana University Press.
Hoequist, C. (1983)。语言节奏类别的持续时间相关性。语音学, 40:19–31。
Hoequist, C. (1983). Durational correlates of linguistic rhythm categories. Phonetica, 40:19–31.
Hogan, JT, & Manyeh, M. (1996)。河野音间距的研究。语音学, 53:221-229。
Hogan, J. T., & Manyeh, M. (1996). A study of Kono tone spacing. Phonetica, 53:221–229.
霍兰德,J. (2001)。押韵的原因:英语诗歌指南(第 3 版)。康涅狄格州纽黑文:耶鲁大学出版社。
Hollander, J. (2001). Rhyme’s Reason: A Guide to English Verse (3rd ed.). New Haven, CT: Yale University Press.
Holleran, S.、Butler, D. 和 Jones, MR (1995)。感知隐含的和声:旋律与和声背景的作用。实验心理学杂志:学习、记忆和认知, 21:737–753。
Holleran, S., Butler, D., & Jones, M. R. (1995). Perceiving implied harmony: The role of melodic and harmonic context. Journal of Experimental Psychology:Learning, Memory and Cognition, 21:737–753.
霍洛威,C.(2001 年)。所有人都震惊了:音乐、激情和政治。德克萨斯州达拉斯:Spence。
Holloway, C. (2001). All Shook Up: Music, Passion, and Politics. Dallas, TX: Spence.
霍尔姆 (2000)。洋泾浜语和克里奥尔语简介。英国剑桥:剑桥大学出版社。
Holm, J. (2000). An Introduction to Pidgins and Creoles. Cambridge, UK: Cambridge University Press.
霍尔斯特,I.(1962 年)。调。伦敦:Faber & Faber。
Holst, I. (1962). Tune. London: Faber & Faber.
Holt, LL, Lotto, AJ, & Diehl, RL (2004)。听觉不连续性与分类相互作用:对语音感知的影响。美国声学学会杂志, 116:1763–1773。
Holt, L. L., Lotto, A. J., & Diehl, R. L. (2004). Auditory discontinuities interact with categorization: Implications for speech perception. Journal of the Acoustical Society of America, 116:1763–1773.
Honey, J. (1989)。口音重要吗?皮格马利翁因素。伦敦:Faber & Faber。
Honey, J. (1989). Does Accent Matter? The Pygmalion Factor. London: Faber & Faber.
H. 珩磨 (2005)。是否有基于感知的节奏节奏运动学模型替代方案?音乐感知, 23:79–85。
Honing, H. (2005). Is there a perception-based alternative to kinematic models of tempo rubato? Music Perception, 23:79–85.
Honorof, DN, & Whalen, DH (2005)。对扬声器 F0 范围内音高位置的感知。美国声学学会杂志, 117:2193–2200。
Honorof, D. N., & Whalen, D. H. (2005). Perception of pitch location within a speaker’s F0 range. Journal of the Acoustical Society of America, 117:2193–2200.
霍顿,T. (2001)。音调结构的组合性:音乐意义概念的生成方法。科学音乐, 5:131–160。
Horton, T. (2001). The compositionality of tonal structures: A generative approach to the notion of musical meaning. Musicae Scientiae, 5:131–160.
House, D. (1990)。语音中的音调感知。瑞典隆德:隆德大学出版社。
House, D. (1990). Tonal Perception in Speech. Lund, Sweden: Lund University Press.
Howard, D.、Rosen, S. 和 Broad, V. (1992)。受过音乐训练和未受过音乐训练的听众对大/小三合会的识别和辨别。音乐感知, 10:205–220。
Howard, D., Rosen, S., & Broad, V. (1992). Major/minor triad identification and discrimination by musically trained and untrained listeners. Music Perception, 10:205–220.
Howe, MJA, Davidson, JW, & Sloboda, JA (1998)。天赋:现实还是神话?行为与脑科学, 21:399–442。
Howe, M. J. A., Davidson, J. W., & Sloboda, J. A. (1998). Innate talents: Reality or myth? Behavioral and Brain Sciences, 21:399–442.
Hudson, A., & Holbrook, A. (1982)。年轻黑人成年人的基频特征。即兴口语和口头阅读。言语和听力研究杂志, 25:25–28。
Hudson, A., & Holbrook, A. (1982). Fundamental frequency characteristics of young black adults. Spontaneous speaking and oral reading. Journal of Speech and Hearing Research, 25:25–28.
休斯 D. (2000)。没有废话:声学标志性助记系统的逻辑和力量。英国民族音乐学杂志, 9:95–122。
Hughes, D. (2000). No nonsense: The logic and power of acoustic-iconic mnemonic systems. British Journal of Ethnomusicology, 9:95–122.
Hulse, SH、Bernard, DJ 和 Braaten, RF (1995)。欧洲八哥(Sturnus vulgaris )对基于和弦的光谱结构的听觉辨别。实验心理学杂志:综合, 124:409-423。
Hulse, S. H., Bernard, D. J., & Braaten, R. F. (1995). Auditory discrimination of chord-based spectral structures by European starlings (Sturnus vulgaris). Journal of Experimental Psychology: General, 124:409–423.
Hulse, SH、Takeuchi, AH 和 Braaten, RF (1992)。音乐比较心理学中的知觉不变性。音乐感知, 10:151–184。
Hulse, S. H., Takeuchi, A. H., & Braaten, R. F. (1992). Perceptual invariances in the comparative psychology of music. Music Perception, 10:151–184.
休伦 D. (2003)。音乐是一种进化适应吗?载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 57-75 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Huron, D. (2003). Is music an evolutionary adaptation? In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 57–75). Cambridge, MA: MIT Press.
休伦 D.(2006 年)。甜蜜的期待:音乐和期待的心理。马萨诸塞州剑桥市:麻省理工学院出版社。
Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press.
Huron, D., & Ollen, J. (2003)。法语和英语主题的强烈对比:进一步支持 Patel 和 Daniele。音乐感知, 21:267–271。
Huron, D., & Ollen, J. (2003). Agogic contrast in French and English themes: Further support for Patel and Daniele. Music Perception, 21:267–271.
Huron, D., & Parncutt, R. (1993)。一种改进的音调感知模型,结合了音调显着性和回声记忆。心理音乐学, 12:154-171。
Huron, D., & Parncutt, R. (1993). An improved model of tonality perception incorporating pitch salience and echoic memory. Psychomusicology, 12:154–171.
Huron, D., & Royal, M. (1996)。什么是旋律重音?从音乐实践中收集证据。音乐感知, 13:489–516。
Huron, D., & Royal, M. (1996). What is melodic accent? Converging evidence from musical practice. Music Perception, 13:489–516.
Husain, F.、Tagamets, M.-A.、Fromm, S.、Braun, A. 和 Horwitz, B. (2004)。将听觉对象处理的神经动力学与神经影像学活动相关联:计算建模和 fMRI 研究。神经影像学, 21:1701–1720。
Husain, F., Tagamets, M.-A., Fromm, S., Braun, A., & Horwitz, B. (2004). Relating neural dynamics for auditory object processing to neuroimaging activity: A computational modeling and fMRI study. NeuroImage, 21:1701–1720.
哈钦斯,S. (2003)。谐波功能分类。海报于 6 月 16 日至 19 日在拉斯维加斯举行的音乐感知与认知协会会议上发表。
Hutchins, S. (2003). Harmonic functional categorization. Poster presented at the Society for Music Perception and Cognition conference, June 16–19, Las Vegas.
Huttenlocher, J.、Haight, W.、Bryk, A.、Seltzer, M. 等。(1991)。早期词汇增长:与语言输入和性别的关系。发展心理学, 27:236–248。
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M., et al. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27:236–248.
Huttenlocher, P. (2002)。神经可塑性。马萨诸塞州剑桥市:哈佛大学出版社。
Huttenlocher, P. (2002). Neural Plasticity. Cambridge, MA: Harvard University Press.
Hyde, KL, & Peretz, I. (2004)。失调但及时的大脑。心理科学, 15:356–360。
Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological Science, 15:356–360.
Hyde, KL、Peretz, I. 和 Cuvelier, H. (2004)。拥有绝对音高的人是否有特殊的音高敏锐度?在:SD Lipscomb 等人。(编辑),第 8 届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿,2004 年(第 741-742 页)。澳大利亚阿德莱德:Causal Productions。
Hyde, K. L., Peretz, I., & Cuvelier, H. (2004). Do possessors of absolute pitch have special pitch acuity? In: S. D. Lipscomb et al. (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL, 2004 (pp. 741–742). Adelaide, Australia: Causal Productions.
Hyde, KL、Zatorre, R.、Griffiths, TD、Lerch, JP 和 Peretz, I.(2006 年)。音乐大脑的形态计量学:一项两点研究。大脑, 129:2562–2570。
Hyde, K. L., Zatorre, R., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of the amusic brain: A two-site study. Brain, 129:2562–2570.
海曼,LM(1973)。音系理论中的[Grave]特征。语音学杂志, 1:329–337。
Hyman, L. M. (1973). The feature [Grave] in phonological theory. Journal of Phonetics, 1:329–337.
海曼,LM(2001)。音调系统。在:M. Maspelmath 等。(编辑),语言类型学和通用语言:国际手册(第 1367-1380 页)。德国柏林:Walter de Gruyter。
Hyman, L. M. (2001). Tone systems. In: M. Maspelmath et al. (Eds.), Language Typology and Language Universals: An International Handbook (pp. 1367–1380). Berlin, Germany: Walter de Gruyter.
Ilie, G., & Thompson, WF (2006)。音乐和言语中声学线索对影响三个维度的比较。音乐感知, 23:319–330。
Ilie, G., & Thompson, W. F. (2006). A comparison of acoustic cues in music and speech for three dimensions of affect. Music Perception, 23:319–330.
Imaizumi, S.、Mori, K.、Kirtani, S.、Kwashima, R.、Sugiura, M. 等。(1997)。说话人的声音识别和情绪会激活不同的大脑区域。神经报告, 8:2809–2812。
Imaizumi, S., Mori, K., Kirtani, S., Kwashima, R., Sugiura, M. et al. (1997). Vocal identification of speaker and emotion activates different brain regions. NeuroReport, 8:2809–2812.
Iversen, JR、Patel, AD 和 Ohgushi, K. (2008)。节奏分组的感知取决于听觉体验。美国声学学会杂志, 124:2263–2271。
Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. Journal of the Acoustical Society of America, 124:2263–2271.
Iversen, JR、Repp, B. 和 Patel, AD (2009)。节奏感知的自上而下控制调节早期听觉反应。纽约科学院年鉴, 1169:58-73。
Iversen, J. R., Repp, B., & Patel, A. D. (2009). Top-down control of rhythm perception modulates early auditory responses. Annals of the New York Academy of Sciences, 1169:58–73.
Iverson, J.、Rees, H. 和 Revlin, R. (1989)。音乐对歌词个人相关性的影响。心理学:人类行为杂志, 26:15–22。
Iverson, J., Rees, H., & Revlin, R. (1989). The effect of music on the personal relevance of lyrics. Psychology: A Journal of Human Behaviour, 26:15–22.
Iverson, P., & Kuhl, PK (2000)。语音感知中的感知磁铁和音素边界效应:它们是否来自共同机制?感知与心理物理学, 62:874-86。
Iverson, P., & Kuhl, P. K. (2000). Perceptual magnet and phoneme boundary effects in speech perception: Do they arise from a common mechanism? Perception and Psychophysics, 62:874–86.
Iverson, P.、Kuhl, P.、Akahane-Yamada, R.、Diesch, E.、Tohkura, Y.、Kettermann, A. 和 Siebert, C. (2003)。非母语音素习得困难的知觉干扰说明。认知87:B47-B57。
Iverson, P., Kuhl, P., Akahane-Yamada, R., Diesch, E., Tohkura, Y., Kettermann, A., & Siebert, C. (2003). A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition 87:B47-B57.
A. 泉 (2000)。日本猴子感知和弦的感官和谐。美国声学学会杂志, 108:3073–3078。
Izumi, A. (2000). Japanese monkeys perceive sensory consonance of chords. Journal of the Acoustical Society of America, 108:3073–3078.
Jackendoff, R. (1977)。伦纳德·伯恩斯坦 (Leonard Bernstein )对未回答问题的评论。语言, 53:883–894。
Jackendoff, R. (1977). Review of The Unanswered Question by Leonard Bernstein. Language, 53:883–894.
Jackendoff, R. (1989)。音乐和语言中节奏结构的比较。在:P. Kiparsky & G. Youmans (Eds.), Phonetics and Phonology, Vol. 1:节奏与韵律(第 15-44 页)。加利福尼亚州圣地亚哥:学术出版社。
Jackendoff, R. (1989). A comparison of rhythmic structures in music and language. In: P. Kiparsky & G. Youmans (Eds.), Phonetics and Phonology, Vol. 1: Rhythm and Meter (pp. 15–44). San Diego, CA: Academic Press.
Jackendoff, R. (1991)。音乐解析和音乐影响。音乐感知, 9:199–230。
Jackendoff, R. (1991). Musical parsing and musical affect. Music Perception, 9:199–230.
Jackendoff, R. (2002)。语言基础。纽约:牛津大学出版社。
Jackendoff, R. (2002). Foundations of Language. New York: Oxford University Press.
Jackendoff, R., & Lerdahl, F. (2006)。音乐的能力:它是什么,它有什么特别之处?认知, 100:33–72。
Jackendoff, R., & Lerdahl, F. (2006). The capacity for music: What is it, and what’s special about it? Cognition, 100:33–72.
Jaeger, F.、Fedorenko, E. 和 Gibson, E. (2005)。生产复杂性和理解复杂性之间的分离。亚利桑那大学第 18 届纽约市立大学句子处理会议上的海报展示。
Jaeger, F., Fedorenko, E., & Gibson, E. (2005). Dissociation Between Production and Comprehension Complexity. Poster Presentation at the 18th CUNY Sentence Processing Conference, University of Arizona.
Jairazbhoy, NA (1995)。北印度音乐的破布(修订版)。印度孟买:流行的 Prakashan。(原作发表于 1971 年)
Jairazbhoy, N. A. (1995). The Rags of North Indian Music (Rev. ed.). Bombay, India: Popular Prakashan. (Original work published 1971)
Jairazbhoy, NA, & Stone, AW (1963)。现代北印度古典音乐中的语调。亚非学院学报, 26:110-132。
Jairazbhoy, N. A., & Stone, A. W. (1963). Intonation in present-day North Indian classical music. Bulletin of the School of Oriental and African Studies, 26:110–132.
雅各布森,R. (1971)。文选(第 3 卷,第 704-705 页)。荷兰海牙:木桐。
Jakobson, R. (1971). Selected Writings (Vol. 3, pp. 704–705). The Hague, The Netherlands: Mouton.
Jakobson, R.、Fant, G. 和 Halle, M. (1952)。语音分析的预备知识:显着特征及其相关性。麻省理工学院声学实验室,第 13 号技术报告。马萨诸塞州剑桥:麻省理工学院。
Jakobson, R., Fant, G., & Halle, M. (1952). Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates. Acoustics Laboratory, MIT, Technical Report No. 13. Cambridge, MA: MIT.
Janata, P.、Birk, JL、Tillmann, B. 和 Bharucha, JJ (2003)。调制上下文中音调弹出的在线检测。音乐感知, 20:283–305。
Janata, P., Birk, J. L., Tillmann, B., & Bharucha, J. J. (2003). Online detection of tonal pop-out in modulating contexts. Music Perception, 20:283–305.
Janata, P.、Birk, JL、Van Horn, JD、Leman, M.、Tillmann, B. 和 Bharucha, JJ (2002)。西方音乐的音调结构的皮层地形图。科学, 298:2167–2170。
Janata, P., Birk, J. L., Van Horn, J. D., Leman, M., Tillmann, B., & Bharucha, J. J. (2002). The cortical topography of tonal structures underlying Western music. Science, 298:2167–2170.
Janata, P., & Grafton, ST (2003)。在大脑中摆动:与排序和音乐相关的行为的共享神经基质。自然神经科学,6, 682–687。
Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for behaviors related to sequencing and music. Nature Neuroscience, 6, 682–687.
贾维斯,编 (2004)。学习了鸟鸣和人类语言的神经生物学。纽约科学院年鉴, 1016,749-777。
Jarvis, E. D. (2004). Learned birdsong and the neurobiology of human language. Annals of the New York Academy of Sciences, 1016, 749–777.
Jaszczolt, KM (2002)。语义学和语用学:语言和话语中的意义。伦敦:朗文。
Jaszczolt, K. M. (2002). Semantics and Pragmatics: Meaning in Language and Discourse. London: Longmans.
Johnson, JS, & Newport, EL (1989)。第二语言学习的关键时期效应:成熟状态对英语作为第二语言习得的影响。认知心理学, 21:60–99。
Johnson, J. S., & Newport, E. L. (1989). Critical period effects in second language learning: The influence of maturational state on the acquisition of English as a second language. Cognitive Psychology, 21:60–99.
约翰逊,K. (1997)。声学和听觉语音学。马萨诸塞州剑桥市:布莱克威尔。
Johnson, K. (1997). Acoustic and Auditory Phonetics. Cambridge, MA: Blackwell.
Johnsrude, IS, Penhune, VB, Zatorre, RJ (2000)。人类右侧听觉皮层感知音调方向的功能特异性。大脑, 123:155-63。
Johnsrude, I. S., Penhune, V. B., Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123:155–63.
宾夕法尼亚州约翰斯顿 (1994)。大脑生理学和音乐认知。博士 论文,加州大学圣地亚哥分校。
Johnston, P. A. (1994). Brain Physiology and Music Cognition. Ph.D. dissertation, University of California, San Diego.
Johnstone, T., & Scherer, KR (1999)。情绪对语音质量的影响。第 14 届国际语音科学大会论文集,旧金山,第 2029-2032 页。
Johnstone, T., & Scherer, K. R. (1999). The effects of emotion on voice quality. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, pp. 2029–2032.
Johnstone, T., & Scherer, KR (2000)。情感的声音交流。载于:M. Lewis 和 JM Haviland-Jones(编),《情绪手册》(第 2 版,第 220-235 页)。纽约:吉尔福德出版社。
Johnstone, T., & Scherer, K. R. (2000). Vocal communication of emotion. In: M. Lewis & J. M. Haviland-Jones (Eds.), Handbook of Emotions (2nd ed., pp. 220–235). New York: Guilford Press.
琼斯先生 (1976)。时间,我们丢失的维度:走向一种新的感知、注意力和记忆理论。心理评论, 83:323-355。
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83:323–355.
琼斯先生 (1987)。音乐中的动态模式结构:最新理论和研究。知觉与心理物理学, 41:21–634。
Jones, M. R. (1987). Dynamic pattern structure in music: Recent theory and research. Perception and Psychophysics, 41:21–634.
琼斯先生 (1993)。音乐模式的动态:旋律和节奏如何结合在一起?载于:TJ Tighe 和 WJ Dowling(编辑),心理学和音乐:旋律和节奏的理解(第 67-92 页)。新泽西州希尔斯代尔:Lawrence Erlbaum。
Jones, M. R. (1993). Dynamics of musical patterns: How do melody and rhythm fit together? In: T. J. Tighe & W. J. Dowling (Eds.), Psychology and Music: The Understanding of Melody and Rhythm (pp. 67–92). Hillsdale, NJ: Lawrence Erlbaum.
Jones, MR, & Boltz, M. (1989)。动态参与和对时间的反应。心理评论, 96:459-491。
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96:459–491.
Jones, MR, & Pfordresher, PQ (1997)。使用联合重音结构跟踪音乐模式。加拿大实验心理学杂志, 51:271–290。
Jones, M. R., & Pfordresher, P. Q. (1997). Tracking musical patterns using joint accent structure. Canadian Journal of Experimental Psychology, 51:271–290.
Jones, MR, & Ralston, JT (1991)。重音结构对旋律识别的一些影响。记忆与认知, 19:8-20。
Jones, M. R., & Ralston, J. T. (1991). Some influences of accent structure on melody recognition. Memory and Cognition, 19:8–20.
S. 琼斯 (1995)。中国民间音乐。纽约:牛津大学出版社。
Jones, S. (1995). Folk Music of China. New York: Oxford University Press.
Jongsma, MLA、Desain, P. 和 Honing, H. (2004)。节奏背景影响音乐家和非音乐家的听觉诱发电位。生物心理学,66:129–152。
Jongsma, M. L. A., Desain, P., & Honing, H. (2004). Rhythmic context influences the auditory evoked potentials of musicians and non-musicians. Biological Psychology, 66:129–152.
Jun, S.-A. (2003)。语调。载于:L. Nadel(主编),认知科学百科全书(第 2 卷,第 618-624 页)。伦敦:自然集团。
Jun, S.-A. (2003). Intonation. In: L. Nadel (Ed.), Encyclopedia of Cognitive Science (Vol. 2, pp. 618–624). London: Nature Group.
Jun, S.-A. (2005)。韵律类型学。在:S.-A。Jun(主编),韵律类型学:语调和短语的音系学(第 430-458 页)。英国牛津:牛津大学出版社。
Jun, S.-A. (2005). Prosodic typology. In: S.-A. Jun (Ed.), Prosodic Typology: The Phonology of Intonation and Phrasing (pp. 430–458). Oxford, UK: Oxford University Press.
Jun, S.-A., & Fougeron, C. (2000)。法语语调的语音模型。载于:A. Botinis(主编),语调:分析、建模和技术(第 209-242 页)。荷兰多德雷赫特:Kluwer Academic Publishers。
Jun, S.-A., & Fougeron, C. (2000). A phonological model of French intonation. In: A. Botinis (Ed.), Intonation: Analysis, Modeling and Technology (pp. 209–242). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Jun, S.-A., & Fougeron, C. (2002)。法语语调中重音短语的实现。普罗布斯, 14:147-172。
Jun, S.-A., & Fougeron, C. (2002). Realizations of accentual phrase in French intonation. Probus, 14:147–172.
Jungers, M.、Palmer, C. 和 Speer, SR (2002)。一次又一次:节奏在音乐和演讲中的协调影响。认知处理, 1–2:21–35。
Jungers, M., Palmer, C., & Speer, S. R. (2002). Time after time: The coordinating influence of tempo in music and speech. Cognitive Processing, 1–2:21–35.
Jurafsky, D. (2003)。心理语言学中的概率建模:语言理解和产生。载于:R. Bod、J. Hay 和 S. Jannedy(编辑),概率语言学。马萨诸塞州剑桥市:麻省理工学院出版社。
Jurafsky, D. (2003). Probabilistic modeling in psycholinguistics: Linguistic comprehension and production. In: R. Bod, J. Hay, & S. Jannedy (Eds.), Probabilistic Linguistics. Cambridge, MA: MIT Press.
Jusczyk, P., & Krumhansl, C. (1993)。影响婴儿对乐句结构敏感性的音调和节奏模式。实验心理学杂志:人类感知和表现, 19:627–640。
Jusczyk, P., & Krumhansl, C. (1993). Pitch and rhythmic patterns affecting infants’ sensitivity to musical phrase structure. Journal of Experimental Psychology: Human Perception and Performance, 19:627–640.
Juslin, PN, & Laukka, P. (2003)。声音表达和音乐表演中的情感交流:不同的渠道,相同的代码?心理公报, 129:770-814。
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129:770–814.
Juslin, PN, & Laukka, P. (2004)。音乐情感的表达、感知和诱导:日常聆听的回顾和问卷调查研究。新音乐研究杂志, 33:217-238。
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and questionnaire study of everyday listening. Journal of New Music Research, 33:217–238.
Juslin, PN 和 Sloboda, JA(编辑)。(2001)。音乐与情感:理论与研究。英国牛津:牛津大学出版社。
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and Emotion: Theory and Research. Oxford, UK: Oxford University Press.
Justus, T., & Hutsler, JJ (2005)。音乐进化心理学的基本问题:评估先天性和领域特异性。音乐感知, 23:1-27。
Justus, T., & Hutsler, J. J. (2005). Fundamental issues in the evolutionary psychology of music: Assessing innateness and domain-specificity. Music Perception, 23:1–27.
Justus, TC, & Bharucha, JJ (2001)。音乐处理中的模块化:谐波启动的自动性实验心理学杂志:人类感知和表现, 27:1000-1011。
Justus, T. C., & Bharucha, J. J. (2001). Modularity in musical processing: The automaticity of harmonic priming Journal of Experimental Psychology: Human Perception and Performance, 27:1000–1011.
Kaan, E.、Harris, T.、Gibson, T. 和 Holcomb, PJ (2000)。P600作为句法整合难度指标。语言和认知过程, 15:159–201。
Kaan, E., Harris, T., Gibson, T., & Holcomb, P. J. (2000). The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes, 15:159–201.
Kaan, E., & Swaab, TY (2002)。句法理解的大脑回路。认知科学趋势, 6:350–356。
Kaan, E., & Swaab, T. Y. (2002). The brain circuitry of syntactic comprehension. Trends in Cognitive Sciences, 6:350–356.
Kalmus, H., & Fry, DB (1980)。关于音调性耳聋(音律障碍):频率、发育、遗传学和音乐背景。人类遗传学年鉴, 43:369–382。
Kalmus, H., & Fry, D. B. (1980). On tune deafness (dysmelodia): Frequency, development, genetics and musical background. Annals of Human Genetics, 43:369–382.
Kameoka, A., & Kuriyagawa, M. (1969a)。协和理论第一部分:二元组的协和。美国声学学会杂志, 45:1451–1459。
Kameoka, A., & Kuriyagawa, M. (1969a). Consonance theory part I: Consonance of dyads. Journal of the Acoustical Society of America, 45:1451–1459.
Kameoka, A., & Kuriyagawa, M. (1969b)。辅音理论第二部分:复声的辅音及其计算方法。美国声学学会杂志, 45:1460–1469。
Kameoka, A., & Kuriyagawa, M. (1969b). Consonance theory part II: Consonance of complex tones and its calculation method. Journal of the Acoustical Society of America, 45:1460–1469.
Karmiloff-Smith, A. (1992)。超越模块化:认知科学的发展视角。马萨诸塞州剑桥市:麻省理工学院出版社。
Karmiloff-Smith, A. (1992). Beyond Modularity: A Developmental Perspective on Cognitive Science. Cambridge, MA: MIT Press.
Karmiloff-Smith, A.、Brown, JH、Grice, S. 和 Paterson, S. (2003)。废除神话:威廉姆斯综合症的认知分离和先天模块化。发育神经心理学, 23:229–244。
Karmiloff-Smith, A., Brown, J. H., Grice, S., & Paterson, S. (2003). Dethroning the myth: Cognitive dissociations and innate modularity in Williams syndrome. Developmental Neuropsychology, 23:229–244.
Karno, M., & Konečni, VJ (1992)。莫扎特 G 小调交响曲第一乐章 K. 550 中结构干预对审美偏好的影响。音乐感知, 10:63–72。
Karno, M., & Konečni, V. J. (1992). The effects of structural intervention in the first movement of Mozart’s Symphony in G-Minor, K. 550, on aesthetic preference. Music Perception, 10:63–72.
RG 卡尔松 (1985)。一到四个月大的婴儿对多音节序列的辨别。实验儿童心理学杂志, 39:326–342。
Karzon, R. G. (1985). Discrimination of polysyllabic sequences by one-to-four-month-old infants. Journal of Experimental Child Psychology, 39:326–342.
JC 卡斯勒 (2005)。通过乐谱表示语音。音乐学研究杂志, 24:227-239。
Kassler, J. C. (2005). Representing speech through musical notation. Journal of Musicological Research, 24: 227–239.
Kazanina, N.、Phillips, C. 和 Idsardi, W. (2006)。意义对语音感知的影响。美国国家科学院院刊, 103:11381–11386。
Kazanina, N., Phillips, C., & Idsardi, W. (2006). The influence of meaning on the perception of speech sounds. Proceedings of the National Academy of Sciences USA, 103:11381–11386.
E. 基恩 (2006)。口语和正式泰米尔语的节奏特征。语言和演讲, 49:299-332。
Keane, E. (2006). Rhythmic characteristics of colloquial and formal Tamil. Language and Speech, 49:299–332.
Kehler, A. (2002)。连贯性、指称和语法理论。加利福尼亚州斯坦福:CLSI 出版物。
Kehler, A. (2002). Coherence, Reference, and the Theory of Grammar. Stanford, CA: CLSI Publications.
Kehler, A. (2004)。话语连贯性。在:LR Horn & G. Ward(编辑),语用学手册(第 241-265 页)。英国牛津:Basil Blackwell。
Kehler, A. (2004). Discourse doherence. In: L. R. Horn & G. Ward (Eds.), Handbook of Pragmatics (pp. 241–265). Oxford, UK: Basil Blackwell.
A. 凯勒 (1978)。伯恩斯坦的未回答的问题和音乐能力的问题。音乐剧季刊, 64:195–222。
Keiler, A. (1978). Bernstein’s The Unanswered Question and the problem of musical competence. The Musical Quarterly, 64:195–222.
MH 凯利和 JK 博克 (1988)。及时强调。实验心理学杂志:人类感知和表现, 14:389-403。
Kelly, M. H., & Bock, J. K. (1988). Stress in time. Journal of Experimental Psychology: Human Perception and Performance, 14:389–403.
Kendall, R., & Carterette, EC (1990)。音乐表达的交流。音乐感知, 8:129–163。
Kendall, R., & Carterette, E. C. (1990). The communication of musical expression. Music Perception, 8:129–163.
Kessler, EJ, Hansen, C., & Shepard, R. (1984)。巴厘岛和西方音乐感知中的音调图式。音乐感知, 2:131–165。
Kessler, E. J., Hansen, C., & Shepard, R. (1984). Tonal schemata in the perception of music in Bali and the West. Music Perception, 2:131–165.
Kim, S. (2003)。词后音调轮廓在分词中的作用。第 15 届国际语音科学大会论文集,巴塞罗那,第 495-498 页。
Kim, S. (2003). The role of post-lexical tonal contours in word segmentation. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 495–498.
King, J., & Just, MA (1991)。句法处理的个体差异:工作记忆的作用。记忆与语言杂志, 30:580-602。
King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language, 30:580–602.
Kippen, J. (1988)。勒克瑙塔布拉:音乐传统的文化分析。英国剑桥:剑桥大学出版社。
Kippen, J. (1988). The Tabla of Lucknow: A Cultural Analysis of a Musical Tradition. Cambridge, UK: Cambridge University Press.
Kisilevsky, BS, Hains, SMJ, Lee, K., Xie, X., Huang, H., Ye, HH, 等。(2003)。经验对胎儿语音识别的影响。心理科学, 14:220-224。
Kisilevsky, B. S., Hains, S. M. J., Lee, K., Xie, X., Huang, H., Ye, H. H., et al. (2003). Effects of experience on fetal voice recognition. Psychological Science, 14:220–224.
Kivy, P. (1980)。有线外壳:对音乐表达的思考。新泽西州普林斯顿:普林斯顿大学出版社。
Kivy, P. (1980). The Corded Shell: Reflections on Musical Expression. Princeton, NJ: Princeton University Press.
Kivy, P. (1990)。唯有音乐:对纯音乐体验的哲学反思。纽约伊萨卡:康奈尔大学出版社。
Kivy, P. (1990). Music Alone: Philosophical Reflections on the Purely Musical Experience. Ithaca, New York: Cornell University Press.
Kivy, P. (2002)。音乐哲学导论。英国牛津:牛津大学出版社。
Kivy, P. (2002). Introduction to a Philosophy of Music. Oxford, UK: Oxford University Press.
D. 克拉特 (1976)。英语中分段持续时间的语言用途:声学和感知证据。美国声学学会杂志, 59:1208–1221。
Klatt, D. (1976). Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America, 59:1208–1221.
D. 克拉特 (1979)。英语句子中分段持续时间的规则合成。载于:B. Lindblom & S. Ohman(编辑),语音通信研究前沿(第 287-299 页)。纽约:学术出版社。
Klatt, D. (1979). Synthesis by rule of segmental durations in English sentences. In: B. Lindblom & S. Ohman (Eds.), Frontiers of Speech Communication Research (pp. 287–299). New York: Academic Press.
Klima, E., & Bellugi, U. (1979)。语言符号。马萨诸塞州剑桥市:哈佛大学出版社。
Klima, E., & Bellugi, U. (1979). The Signs of Language. Cambridge, MA: Harvard University Press.
Kluender, KR、Diehl, RL 和 Killeen, PR (1987)。日本鹌鹑可以学习语音类别。科学, 237:1195–1197。
Kluender, K. R., Diehl, R. L., & Killeen, P. R. (1987). Japanese quail can learn phonetic categories. Science, 237:1195–1197.
Kmetz, J.、Finscher, L.、Schubert, G.、Schepping, W. 和 Bohlman, PV (2001)。德国。载于:S. Sadie(编者),新格罗夫音乐和音乐家词典(第 9 卷,第 708-744 页)。纽约:格罗夫。
Kmetz, J., Finscher, L., Schubert, G., Schepping, W., & Bohlman, P. V. (2001). Germany. In: S. Sadie (Ed)., The New Grove Dictionary of Music and Musicians (Vol. 9, pp. 708–744). New York: Grove.
Knightly, LM, Jun, S.-A., Oh, JS, & Au, TK-F。(2003)。童年偷听的生产效益。美国声学学会杂志, 114:465-474。
Knightly, L. M., Jun, S.-A., Oh, J. S., & Au, T. K.-F. (2003). Production benefits of childhood overhearing. Journal of the Acoustical Society of America, 114:465–474.
Knösche, TR、Neuhaus, C.、Haueisen, J.、Alter, K.、Maess, B.、Witte, OW 和 Friederici, AD (2005)。音乐乐句结构的感知。人脑绘图, 24:259–273。
Knösche, T. R., Neuhaus, C., Haueisen, J., Alter, K., Maess, B., Witte, O. W., & Friederici, A. D. (2005). The perception of phrase structure in music. Human Brain Mapping, 24:259–273.
Koelsch, S.、Grossmann, T.、Gunter, T.、Hahne, A. 和 Friederici, A. (2003)。儿童处理音乐:大脑电反应揭示了音乐能力和性别差异。认知神经科学杂志,15:683–93。
Koelsch, S., Grossmann, T., Gunter, T., Hahne, A., & Friederici, A. (2003). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience, 15:683–93.
Koelsch, S.、Gunter, TC、Friederici, AD 和 Schröger, E. (2000)。音乐处理的大脑指标:“非音乐家”具有音乐性。认知神经科学杂志, 12:520-541。
Koelsch, S., Gunter, T. C., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing: “Non-musicians” are musical. Journal of Cognitive Neuroscience, 12:520–541.
Koelsch, S.、Gunter, TC、von Cramon, DY、Zysset, S.、Lohmann, G. 和 Friederici, AD(2002)。巴赫说:皮质“语言网络”服务于音乐处理。神经影像学, 17:956–966。
Koelsch, S., Gunter, T. C., von Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D.(2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage, 17:956–966.
Koelsch, S.、Gunter, TC、Wittforth, M. 和 Sammler, D. (2005)。语言和音乐句法处理之间的相互作用:一项 ERP 研究。认知神经科学杂志, 17:1565–1577。
Koelsch, S., Gunter, T. C., Wittforth, M., & Sammler, D. (2005). Interaction between syntax processing in language and music: An ERP study. Journal of Cognitive Neuroscience, 17:1565–1577.
Koelsch, S.、Jentschke, S.、Sammler, D. 和 Mietchen, D. (2007)。理清句法和感官处理:音乐感知的 ERP 研究。心理生理学, 44:476–490。
Koelsch, S., Jentschke, S., Sammler, D., & Mietchen, D. (2007). Untangling syntactic and sensory processing: An ERP study of music perception. Psychophysiology, 44:476–490.
Koeslch, S.、Kasper, E.、Sammler, D.、Schulze, K.、Gunter, T. 和 Friederici, AD (2004)。音乐、语言和意义:语义处理的大脑特征。自然神经科学, 7:302–207。
Koeslch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. D. (2004). Music, language, and meaning: Brain signatures of semantic processing. Nature Neuroscience, 7:302–207.
Koelsch, S., & Mulder, J. (2002)。在聆听富有表现力的音乐时,大脑对不适当的和声做出反应。临床神经生理学, 113:862–869。
Koelsch, S., & Mulder, J. (2002). Electric brain responses to inappropriate harmonies during listening to expressive music. Clinical Neurophysiology, 113:862–869.
Koelsch, S., & Siebel, WA (2005)。迈向音乐感知的神经基础。认知科学趋势, 9:578–584。
Koelsch, S., & Siebel, W. A. (2005). Toward a neural basis of music perception. Trends in Cognitive Sciences, 9:578–584.
科尔克,HH (1998)。失语症中的句法障碍:语言描述和处理方法。载于:B. Stemmer 和 HA Whitaker(编辑),神经语言学手册(第 249-260 页)。加利福尼亚州圣地亚哥:学术出版社。
Kolk, H. H. (1998). Disorders of syntax in aphasia: Linguistic-descriptive and processing approaches. In: B. Stemmer & H. A. Whitaker (Eds.), Handbook of Neurolinguistics (pp. 249–260). San Diego, CA: Academic Press.
Kolk, HH, & Friederici, AD (1985)。Broca 和 Wernicke 失语症对句子理解的策略和损害。皮层, 21:47–67。
Kolk, H. H., & Friederici, A. D. (1985). Strategy and impairment in sentence understanding by Broca’s and Wernicke’s aphasics. Cortex, 21:47–67.
Konieczny, L. (2000)。局部性和解析复杂性。心理语言学研究杂志, 29:627–645。
Konieczny, L. (2000). Locality and parsing complexity. Journal of Psycholinguistic Research, 29:627–645.
NK 昆 (1979)。中国民间音乐的五声音阶调式。中国音乐, 2:10–13。
Koon, N. K. (1979). The five pentatonic modes in Chinese folk music. Chinese Music, 2:10–13.
Koopman, C., & Davies, S. (2001)。更广阔视野中的音乐意义。美学与艺术批评杂志, 59:261-273。
Koopman, C., & Davies, S. (2001). Musical meaning in a broader perspective. The Journal of Aesthetics and Art Criticism, 59:261–273.
Kotz, SA、Frisch, S.、von Cramon, DY 和 Friederici, AD (2003)。句法语言处理:关于基底神经节作用的 ERP 病变数据。国际神经心理学会杂志, 9:1053–1060。
Kotz, S. A., Frisch, S., von Cramon, D. Y., & Friederici, A. D. (2003). Syntactic language processing: ERP lesion data on the role of the basal ganglia. Journal of the International Neuropsychologcial Society, 9:1053–1060.
Kraljic, T., & Samuel, A. (2005)。语音的感知学习:是否恢复正常?认知心理学, 51:141–178。
Kraljic, T., & Samuel, A. (2005). Perceptual learning for speech: Is there a return to normal? Cognitive Psychology, 51:141–178.
克莱默,L.(2002 年)。音乐意义:走向批判的历史。伯克利:加州大学出版社。
Kramer, L. (2002). Musical Meaning: Toward a Critical History. Berkeley: University of California Press.
克里斯托佛森,AB (1980)。持续时间歧视中的量子阶跃函数。知觉与心理物理学, 27:300–6。
Kristofferson, A. B. (1980). A quantal step function in duration discrimination. Perception and Psychophysics, 27:300–6.
Kronman, U., & Sundberg, J. (1987)。音乐迟钝是对身体运动的暗示吗?载于:A. Gabrielsson(主编),节奏和音乐中的动作和感知(第 57-68 页)。斯德哥尔摩:瑞典皇家音乐学院。
Kronman, U., & Sundberg, J. (1987). Is the musical retard an allusion to physical motion? In: A. Gabrielsson (Ed.), Action and Perception in Rhythm and Music (pp. 57–68). Stockholm: Royal Swedish Academy of Music.
Krumhansl, CL (1979)。音调背景下音调的心理表征。认知心理学, 11:346-374。
Krumhansl, C. L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive Psychology, 11:346–374.
Krumhansl, CL (1989)。为什么音乐的音色如此难以理解?载于:S. Nielzén 和 O. Olsson(编辑),电声声音和音乐的结构和感知(第 43-54 页)。纽约:医学摘录。
Krumhansl, C. L. (1989). Why is musical timbre so hard to understand? In: S. Nielzén & O. Olsson (Eds.), Structure and Perception of Electroacoustic Sound and Music (pp. 43–54). New York: Excerpta Medica.
Krumhansl, CL (1990)。音调的认知基础。纽约:牛津大学出版社。
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. New York: Oxford University Press.
Krumhansl, CL (1991)。旋律结构:理论和经验描述。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 269–283 页)。伦敦:麦克米伦。
Krumhansl, C. L. (1991). Melodic structure: Theoretical and empirical descriptions. In: J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 269–283). London: MacMillian.
Krumhansl, CL (1992)。音乐感知和表演的内部表征。载于:MR Jones & S. Holleran(编辑),音乐传播的认知基础(第 197-211 页)。华盛顿特区:美国心理学会。
Krumhansl, C. L. (1992). Internal representations for music perception and performance. In: M. R. Jones & S. Holleran (Eds.), Cognitive Bases of Musical Communication (pp. 197–211). Washington, DC: American Psychological Association.
Krumhansl, CL (1995a)。音乐背景对相似性和期望的影响。Systematische Musikwissenschaft(系统音乐学), 3:211–250。
Krumhansl, C. L. (1995a). Effects of musical context on similarity and expectancy. Systematische Musikwissenschaft (Systematic Musicology), 3:211–250.
Krumhansl, CL (1995b)。音乐心理学和音乐理论:问题和前景。音乐理论谱, 17:53–90。
Krumhansl, C. L. (1995b). Music psychology and music theory: Problems and prospects. Music Theory Spectrum, 17:53–90.
Krumhansl, CL (1996)。莫扎特钢琴奏鸣曲 K.282 的感性分析:分段、张力和音乐理念。音乐感知, 13:401–432。
Krumhansl, C. L. (1996). A perceptual analysis of Mozart’s Piano Sonata K. 282: Segmentation, tension, and musical ideas. Music Perception, 13:401–432.
Krumhansl, CL (1997)。音乐情感和心理生理学的探索性研究。加拿大实验心理学杂志, 51:336–353。
Krumhansl, C. L. (1997). An exploratory study of musical emotions and psychophysiology. Canadian Journal of Experimental Psychology, 51:336–353.
Krumhansl, CL (1998)。音乐主题:对莫扎特 C 大调弦乐五重奏和贝多芬 A 小调弦乐四重奏的可记忆性、开放性和情感的实证研究。音乐感知,16:119–134。
Krumhansl, C. L. (1998). Topic in music: An empirical study of memorability, openness, and emotion in Mozart’s String Quintet in C Major and Beethoven’s String Quartet in A Minor. Music Perception, 16:119–134.
Krumhansl, CL (2000)。调性归纳:跨文化应用的统计方法。音乐感知, 17:461–479。
Krumhansl, C. L. (2000). Tonality induction: A statistical approach applied cross-culturally. Music Perception, 17:461–479.
Krumhansl, CL (2005)。调性的认知——正如我们今天所知道的那样。新音乐研究杂志, 33:253-268。
Krumhansl, C. L. (2005). The cognition of tonality—as we know it today. Journal of New Music Research, 33:253–268.
Krumhansl, CL, Bharucha, JJ, & Kessler, EJ (1982)。感知相关音调中和弦的和声结构。实验心理学杂志:人类感知和表现, 8:24-36。
Krumhansl, C. L., Bharucha, J. J., & Kessler, E. J. (1982). Perceived harmonic structure of chords in thee related musical keys. Journal of Experimental Psychology: Human Perception and Performance, 8:24–36.
Krumhansl, CL, & Jusczyk, P. (1990)。婴儿对音乐乐句结构的感知。心理科学, 1:70–73。
Krumhansl, C. L., & Jusczyk, P. (1990). Infants’ perception of phrase structure in music. Psychological Science, 1:70–73.
Krumhansl, CL, & Kessler, EJ (1982)。在音乐键的空间表示中追踪感知音调组织的动态变化。心理评论, 89:334-368。
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89:334–368.
Krumhansl, CL、Louhivouri, J.、Toiviainen, P.、Järvinen, T. 和 Eerola, T. (1999)。芬兰民间赞美诗中的旋律预期:行为、统计和计算方法的融合。音乐感知, 17:151–197。
Krumhansl, C. L., Louhivouri, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodic expectancy in Finnish folk hymns: Convergence of behavioral, statistical, and computational approaches. Music Perception, 17:151–197.
Krumhansl, CL、Toivanen, P.、Eerola, T.、Toiviainen, P、Järvinen, T. 和 Louhivuori, J. (2000)。跨文化音乐认知:应用于北萨米 yoiks 的认知方法。认知, 76:13-58。
Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P, Järvinen, T., & Louhivuori, J. (2000). Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks. Cognition, 76:13–58.
Kugler, K., & Savage-Rumbaugh, S.(2002 年 6 月)。语言研究中心的成年雄性倭黑猩猩(Pan paniscus) Kanzi 的节奏鼓声。论文在俄克拉荷马州俄克拉荷马大学美国灵长类动物学会第 25 次会议上发表。
Kugler, K., & Savage-Rumbaugh, S. (2002, June). Rhythmic drumming by Kanzi, an adult male bonobo (Pan paniscus) at the language research center. Paper presented at the 25th meeting of the American Society of Primatologists, Oklahoma University, Oklahoma.
库尔,PK (1979)。婴儿早期的言语感知:光谱不同元音类别的感知恒常性。美国声学学会杂志, 66:1668–1679。
Kuhl, P. K. (1979). Speech perception in early infancy: Perceptual constancy for spectrally dissimilar vowel categories. Journal of the Acoustic Society of America, 66:1668–1679.
库尔,PK (1983)。婴儿早期言语听觉等价类的感知。婴儿行为和发展, 6:263–285。
Kuhl, P. K. (1983). Perception of auditory equivalence classes for speech in early infancy. Infant Behavior and Development, 6:263–285.
库尔,PK (1991)。人类成年人和人类婴儿对语音类别的原型表现出“知觉磁铁效应”,而猴子则没有。感知与心理物理学50:93—107。
Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception and Psycho-physics 50:93–107.
库尔,PK (1993)。先天倾向和言语感知经验的影响:母语磁铁理论。载于:J. Morton(主编),发育神经认知:生命第一年的言语和面部处理(第 259-274 页)。荷兰多德雷赫特:Kluwer。
Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native language magnet theory. In: J. Morton (Ed.), Developmental Neurocognition: Speech and Face Processing in the First Year of Life (pp. 259–274). Dordrecht, The Netherlands: Kluwer.
库尔,PK (2004)。早期语言习得:破解语音密码。自然评论(神经科学), 5:831–843。
Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews (Neuroscience), 5:831–843.
Kuhl, PK, Andruski, J., Chistovich, I., Chistovich, L., Kozhevnikova, E., Ryskina, V., Stolyarova, E., Sundberg, U., & Lacerda, F. (1997)。面向婴儿的语言中语音单位的跨语言分析。科学, 277:684–686。
Kuhl, P. K., Andruski, J., Chistovich, I., Chistovich, L., Kozhevnikova, E., Ryskina, V., Stolyarova, E., Sundberg, U., & Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277:684–686.
Kuhl, PK、Conboy, BT、Coffey-Corina, S.、Padden, D.、Rivera-Gaxiola, M. 和 Nelson, T.(出版中)。语音学习作为语言的途径:新数据和母语磁石理论扩展 (NLM-e)。英国皇家学会哲学汇刊 B.
Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (in press). Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B.
Kuhl, PK 和 Miller, JD (1975)。龙猫的语音感知:肺泡塞音辅音中的语音与清音的区别。科学, 90:69–72。
Kuhl, P. K., & Miller, J. D. (1975). Speech perception by the chinchilla: Voice-voiceless distinction in alveolar plosive consonants. Science, 90:69–72.
Kuhl, PK, Tsao, F.-M., & Liu, H.-M. (2003)。婴儿时期的外语经历:短期接触和社交互动对语音学习的影响。美国国家科学院院刊, 100:9096–9101。
Kuhl, P. K., Tsao, F.-M., & Liu, H.-M. (2003). Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences, USA, 100:9096–9101.
Kuhl, PK、Williams, KA、Lacerda, F.、Stevens, KN 和 Lindblom, B. (1992)。语言经验改变了 6 个月大的婴儿的语音感知。科学, 255:606–608。
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255:606–608.
Kuperberg, GR、Lakshmanan, BM、Caplan, DN 和 Holcomb, PJ (2006)。理解话语:跨句子因果推理的 fMRI 研究。神经影像, 33:343–361。
Kuperberg, G. R., Lakshmanan, B. M., Caplan, D. N., & Holcomb, P. J. (2006). Making sense of discourse: An fMRI study of causal inferencing across sentences. Neuro-Image, 33:343–361.
Kusumoto, K., & Moreton, E.(1997 年 12 月)。母语决定了非语言节奏刺激的解析。海报在第 134 届美国声学学会会议上发表,加利福尼亚州圣地亚哥。
Kusumoto, K., & Moreton, E. (1997, December). Native language determines parsing of nonlinguistic rhythmic stimuli. Poster presented at the 134th meeting of the Acoustical Society of America, San Diego, CA.
Kutas, M., & Hillyard, SA (1984)。阅读过程中的大脑潜能反映了对单词的期望和语义联想。自然, 307:161-163。
Kutas, M., & Hillyard, S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307:161–163.
Labov, W. (1966)。纽约市英语的社会分层。华盛顿特区:应用语言学中心。
Labov, W. (1966). The Social Stratification of English in New York City. Washington, DC: Center for Applied Linguistics.
拉德博士 (1986)。语调措辞:递归韵律结构的案例。音系年鉴, 1:53–74。
Ladd, D. R. (1986). Intonational phrasing: The case for recursive prosodic structure. Phonology Yearbook, 1:53–74.
拉德博士 (1987)。回顾 Bolinger 1986。语言, 63:637–643。
Ladd, D. R. (1987). Review of Bolinger 1986. Language, 63:637–643.
拉德博士 (1996)。语调音系学。英国剑桥:剑桥大学出版社。
Ladd, D. R. (1996). Intonational Phonology. Cambridge, UK: Cambridge University Press.
拉德博士 (2001)。语调。收录于:M. Haspelmath、E. König、W. Oesterreicher 和 W. Raible(编),语言类型学和通用语言(第 2 卷,第 1380–1390 页)。德国柏林:Walter de Gruyter。
Ladd, D. R. (2001). Intonation. In: M. Haspelmath, E. König, W. Oesterreicher, & W. Raible (Eds.), Language Typology and Language Universals (Vol. 2, pp. 1380–1390). Berlin, Germany: Walter de Gruyter.
Ladd, DR(即将出版)。语调音系学(第 2 版)。英国剑桥:剑桥大学出版社。
Ladd, D. R. (forthcoming). Intonational Phonology (2nd ed.). Cambridge, UK: Cambridge University Press.
Ladd, DR、Faulkner, D.、Faulkner, H. 和 Schepman, A.(1999 年)。语速变化下 F0 运动的持续“分段锚定”。美国声学学会杂志, 106:1543–1554。
Ladd, D. R., Faulkner, D., Faulkner, H., & Schepman, A. (1999). Constant “segmental anchoring” of F0 movements under changes in speech rate. Journal of the Acoustical Society of America, 106:1543–1554.
Ladd, DR, & Morton, R. (1997)。语调强调的感知:连续的还是绝对的?语音学杂志, 25:313–342。
Ladd, D. R., & Morton, R. (1997). The perception of intonational emphasis: Continuous or categorical? Journal of Phonetics, 25:313–342.
Ladd, DR、Silverman, KEA、Tolkmitt, F.、Bergmann, G. 和 Scherer, KR (1985)。声调轮廓类型、语音质量和 F0 范围在信号扬声器影响中的独立功能的证据。美国声学学会杂志, 78:435–444。
Ladd, D. R., Silverman, K. E. A., Tolkmitt, F., Bergmann, G., & Scherer, K. R. (1985). Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect. Journal of the Acoustical Society of America, 78:435–444.
Ladefoged, P. (1964)。西非语言的语音研究。伦敦:剑桥大学出版社。
Ladefoged, P. (1964). A Phonetic Study of West African Languages. London: Cambridge University Press.
Ladefoged, P. (1975)。语音学课程。纽约:Harcourt Brace Jovanovich。
Ladefoged, P. (1975). A Course in Phonetics. New York: Harcourt Brace Jovanovich.
Ladefoged, P. (2001)。元音和辅音:语言发音介绍。马萨诸塞州马尔登:布莱克威尔。
Ladefoged, P. (2001). Vowels and Consonants: An Introduction to the Sounds of Languages. Malden, MA: Blackwell.
Ladefoged, P. (2006)。语音学课程(第 5 版)。波士顿:汤普森。Ladefoged, P., & Maddieson, I. (1996)。世界语言的声音。英国牛津:布莱克威尔。
Ladefoged, P. (2006). A Course in Phonetics (5th Ed.). Boston: Thompson. Ladefoged, P., & Maddieson, I. (1996). The Sounds of the World’s Languages. Oxford, UK: Blackwell.
Lai, CSL、Gerrelli, D.、Monaco, AP、Fisher, SE 和 Copp, AJ (2003)。大脑发育过程中的 FOXP2 表达与严重言语和语言障碍的成人病理部位一致。大脑, 126:2455–2462。
Lai, C. S. L., Gerrelli, D., Monaco, A. P., Fisher, S. E., & Copp, A. J. (2003). FOXP2 expression during brain development coincides with adult sites of pathology in a severe speech and language disorder. Brain, 126:2455–2462.
Lallite, P., & Bigand, E. (2006)。当下的音乐:重新审视大型结构的影响。知觉和运动技能, 103:811-828。
Lallite, P., & Bigand, E. (2006) . Music in the moment: Revisiting the effect of large scale structure. Perceptual and Motor Skills, 103:811–828.
Lane, R., & Nadel, L.(编辑)。(2000)。情绪的认知神经科学。纽约:牛津大学出版社。
Lane, R., & Nadel, L. (Eds.). (2000). Cognitive Neuroscience of Emotion. New York: Oxford University Press.
Langacker, RW (1988)。语言语义学的观点。载于:B. Rudzka-Ostyn(主编),认知语言学主题(第 49-60 页)。阿姆斯特丹/费城:J. Benjamins。
Langacker, R. W. (1988). A view of linguistic semantics. In: B. Rudzka-Ostyn (Ed.), Topics in Cognitive Linguistics (pp. 49–60). Amsterdam/Philadelphia: J. Benjamins.
Langer, S. (1942)。新钥匙中的哲学:理性、权利和艺术的象征主义研究。马萨诸塞州剑桥市:哈佛大学出版社。
Langer, S. (1942). Philosophy in a New Key: A Study in the Symbolism of Reason, Right, and Art. Cambridge, MA: Harvard University Press.
大,电子战 (2000)。关于将动作与音乐同步。人体运动科学, 19:527–566。
Large, E. W. (2000). On synchronizing movements to music. Human Movement Science, 19:527–566.
Large, EW, & Jones, MR (1999)。参与的动态:我们如何跟踪随时间变化的事件。心理评论, 106:119-159。
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How we track time-varying events. Psychological Review, 106:119–159.
Large, EW, & Palmer, C. (2002)。感知音乐中的时间规律性。认知科学, 26:1-37。
Large, E. W., & Palmer, C. (2002). Perceiving temporal regularity in music. Cognitive Science, 26:1–37.
Large, EW, Palmer, C., & Pollack, JB (1995)。减少音乐的记忆表示。认知科学, 19:53–96。
Large, E. W., Palmer, C., & Pollack, J. B. (1995). Reduced memory representations for music. Cognitive Science, 19:53–96.
Lau, E.、Stroud, C.、Plesch, S. 和 Phillips, C.(2006 年)。结构预测在快速句法分析中的作用。大脑和语言, 98:74–88。
Lau, E., Stroud, C., Plesch, S., & Phillips, C. (2006). The role of structural prediction in rapid syntactic analysis. Brain and Language, 98:74–88.
Laukka, P.、Juslin, PN 和 Bresin, R. (2005)。一种用声音表达情感的维度方法。认知与情感, 19:633–653。
Laukka, P., Juslin, P. N., & Bresin, R. (2005). A dimensional approach to vocal expression of emotion. Cognition and Emotion, 19:633–653.
Lecanuet, J. (1996)。产前听觉体验。载于:I. Deliége 和 J. Sloboda(编辑),音乐起源:音乐能力的起源和发展(第 3-34 页)。英国牛津:牛津大学出版社。
Lecanuet, J. (1996). Prenatal auditory experience. In: I. Deliége & J. Sloboda (Eds.), Musical Beginnings: Origins and Development of Musical Competence (pp. 3–34). Oxford, UK: Oxford University Press.
Lecarme, J. (1991)。Focus en somali:Syntaxe et interpretation。非洲语言学, 7:33-63。
Lecarme, J. (1991). Focus en somali: Syntaxe et interpretation. Linguistique Africaine, 7: 33–63.
LeDoux, J. (1996)。情绪大脑。纽约:西蒙与舒斯特。
LeDoux, J. (1996). The Emotional Brain. New York: Simon & Schuster.
Lee, CS, & Todd, NP McA。(2004)。对语音节奏的听觉描述:听觉“原始草图”模型在两个多语言语料库中的应用。认知, 93:225-254。
Lee, C. S., & Todd, N. P. McA. (2004). Toward an auditory account of speech rhythm: Application of a model of the auditory “primal sketch” to two multi-language corpora. Cognition, 93:225–254.
Lehiste, I. (1977)。重新考虑等时性。语音学杂志, 5:253–263。
Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics, 5:253–263.
Lehiste, I. (1990)。立陶宛口头诗歌韵律结构的声学分析。波罗的海研究杂志, 21:145-155。
Lehiste, I. (1990). An acoustic analysis of the metrical structure of orally produced Lithuanian poetry. Journal of Baltic Studies, 21:145–155.
Lehiste, I. (1991)。语音研究:概述。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 98-107 页)。伦敦:麦克米伦。
Lehiste, I. (1991). Speech research: An overview. In: J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 98–107). London: Macmillan.
Lehiste, I., & Fox, RA (1992)。爱沙尼亚语和英语听众对突出性的看法。语言和演讲, 35:419-434。
Lehiste, I., & Fox, R. A. (1992). Perception of prominence by Estonian and English listeners. Language and Speech, 35:419–434.
Leman, M. (1995)。音乐和图式理论。德国柏林:施普林格。
Leman, M. (1995). Music and Schema Theory. Berlin, Germany: Springer.
Leman, M. (2000)。短期记忆在探测音评级中作用的听觉模型。音乐感知, 17:481–509。
Leman, M. (2000). An auditory model of the role of short-term memory in probe-tone ratings. Music Perception, 17:481–509.
E. Lennenberg (1967)。语言的生物学基础。纽约:威利。
Lennenberg, E. (1967). Biological Foundations of Language. New York: Wiley.
Lerdahl, F. (2001)。音高空间。纽约:牛津大学出版社。
Lerdahl, F. (2001). Tonal Pitch Space. New York: Oxford University Press.
Lerdahl, F. (2003)。诗歌的声音被视为音乐。在:I. Peretz & R. Zatorre(编辑),音乐的认知神经科学(第 413-429 页)。纽约:牛津大学出版社。
Lerdahl, F. (2003). The sounds of poetry viewed as music. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music (pp. 413–429). New York: Oxford University Press.
Lerdahl, F., & Halle, J. (1991)。一些诗行被视为音乐。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 34-47 页)。伦敦:麦克米伦。
Lerdahl, F., & Halle, J. (1991). Some lines of poetry viewed as music. In: J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 34–47). London: Macmillan.
Lerdahl, F., & Jackendoff, R. (1983)。音调音乐的生成理论。马萨诸塞州剑桥市:麻省理工学院出版社。
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press.
Lerdahl, F., & Krumhansl, CL (2007)。模拟色调张力。音乐感知, 24:329–366。
Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception, 24:329–366.
Levelt, WJM (1989)。说:从意图到表达。马萨诸塞州剑桥市:麻省理工学院出版社。
Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.
Levelt, WJM (1999)。单词生成模型。认知科学趋势, 3:223–232。
Levelt, W. J. M. (1999). Models of word production. Trends in Cognitive Sciences, 3:223–232.
莱文森,G. (1997)。当下的音乐。纽约伊萨卡:康奈尔大学出版社。
Levinson, G. (1997). Music in the Moment. Ithaca, New York: Cornell University Press.
Lévi-Strauss, C. (1964/1969)。生的和熟的:神话科学导论(J. Weightman & D. Weightman, Trans.)。纽约:Harper and Row。
Lévi-Strauss, C. (1964/1969). The Raw and the Cooked: Introduction to a Science of Mythology (J. Weightman & D. Weightman, Trans.). New York: Harper and Row.
列维汀,DJ (1994)。音调的绝对记忆:来自学习旋律产生的证据。知觉与心理物理学, 56:414–423。
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies. Perception and Psychophysics, 56:414–423.
Levitin, DJ、Cole, K.、Chiles, M.、Lai, Z.、Lincoln, A. 和 Bellugi, U. (2004)。描述威廉姆斯综合症患者的音乐表型。儿童神经心理学, 10:223–247。
Levitin, D. J., Cole, K., Chiles, M., Lai, Z., Lincoln, A., & Bellugi, U. (2004). Characterizing the musical phenotype in individuals with Williams Syndrome. Child Neuropsychology, 10:223–247.
Levitin, DJ, & Menon, V. (2003)。音乐结构在大脑的“语言”区域进行处理:布罗德曼 47 区在时间连贯性方面的可能作用。神经影像, 20:2141–2152。
Levitin, D. J., & Menon, V. (2003). Musical structure is processed in “language” areas of the brain: A possible role for Brodmann Area 47 in temporal coherence. Neuro-Image, 20:2141–2152.
Levitin, DJ, & Rogers, SE (2005)。绝对音高:感知、编码和争议。认知科学趋势, 9:26–33。
Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Sciences, 9:26–33.
Levitin, DJ, & Zatorre, RJ (2003)。关于早期音乐训练和绝对音高的本质:对 Brown、Sachs、Cammuso 和 Foldstein 的答复。音乐感知, 21:105–110。
Levitin, D. J., & Zatorre, R. J. (2003). On the nature of early music training and absolute pitch: A reply to Brown, Sachs, Cammuso and Foldstein. Music Perception, 21:105–110.
Levy, M. (1982)。北印度音乐中的语调:理论与当代实践的精选比较。印度新德里:Biblia Impex。
Levy, M. (1982). Intonation in North Indian Music: A Select Comparison of Theories With Contemporary Practice. New Delhi, India: Biblia Impex.
Levy, R.(出版中)。基于期望的句法理解。认识。
Levy, R. (in press). Expectation-based syntactic comprehension. Cognition.
Lewis, RL、Vasishth, S. 和 Van Dyke, JA (2006)。句子理解中工作记忆的计算原理。认知科学趋势, 10:447–454。
Lewis, R. L., Vasishth, S., & Van Dyke, J. A. (2006). Computational principles of working memory in sentence comprehension. Trends in Cognitive Sciences, 10:447–454.
李 G. (2001)。拟声词及其超越:京剧锣鼓经研究。博士 论文,加州大学洛杉矶分校。
Li, G. (2001). Onomatopoeia and Beyond: A Study of the Luogu Jing of the Beijing Opera. Ph.D. dissertation, University of California, Los Angeles.
Liberman, A. (1996)。语音:特殊代码。马萨诸塞州剑桥市:麻省理工学院出版社。
Liberman, A. (1996). Speech: A Special Code. Cambridge, MA: MIT Press.
Liberman, AM、Cooper, FS、Shankweiler, DP 和 Studdert-Kennedy, M.(1967 年)。语音代码的感知。心理评论, 74:431-461。
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74:431–461.
Liberman, AM, Harris, KS, Hoffman, H., & B. Griffith。(1957 年)。语音在音素边界内和跨音素边界的辨别力。实验心理学杂志, 54:358–368。
Liberman, A. M., Harris, K. S., Hoffman, H., & B. Griffith. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54:358–368.
利伯曼,M. (1975)。英语语调系统。博士 论文,麻省理工学院,剑桥。
Liberman, M. (1975). The Intonational System of English. Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge.
Liberman, M., & Pierrehumbert, J. (1984)。音高范围和长度变化下的语调不变性。收录于:M. Aronoff & R. Oerhle(编辑),语言声音结构(第 157-233 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Liberman, M., & Pierrehumbert, J. (1984). Intonational invariance under changes in pitch range and length. In: M. Aronoff & R. Oerhle (Eds.), Language Sound Structure (pp. 157–233). Cambridge, MA: MIT Press.
Liberman, M., & Prince, A. (1977)。关于重音和语言节奏。语言查询, 8:249–336。
Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249–336.
P. 利伯曼 (1967)。语调、知觉和语言。马萨诸塞州剑桥市:麻省理工学院出版社。P. 利伯曼 (1984)。语言的生物学和进化。马萨诸塞州剑桥市:哈佛大学出版社。
Lieberman, P. (1967). Intonation, Perception, and Language. Cambridge, MA: MIT Press. Lieberman, P. (1984). The Biology and Evolution of Language. Cambridge, MA: Harvard University Press.
P. 利伯曼 (2000)。人类语言和我们的爬行动物大脑:语音、句法和思想的皮层下基础。马萨诸塞州剑桥市:哈佛大学出版社。
Lieberman, P. (2000). Human Language and Our Reptilian Brain: The Subcortical Bases of Speech, Syntax, and Thought. Cambridge, MA: Harvard University Press.
Lieberman, P.、Klatt, DH 和 Wilson, WH (1969)。恒河猴和其他非人灵长类动物元音库的声道限制。科学, 164:1185–1187。
Lieberman, P., Klatt, D. H., & Wilson, W. H. (1969). Vocal tract limitations on the vowel repertoires of rhesus monkeys and other nonhuman primates. Science, 164:1185–1187.
Liégeois, F.、Baldeweg, T.、Connelly, A.、Gadian, DG、Mishkin, M. 和 Vargha-Khadem, F. (2003)。与 FOXP2 基因突变相关的语言功能磁共振成像异常。自然神经科学, 6:1230–1237。
Liégeois, F., Baldeweg, T., Connelly, A., Gadian, D. G., Mishkin, M., & Vargha-Khadem, F. (2003). Language fMRI abnormalities associated with FOXP2 gene mutation. Nature Neuroscience, 6:1230–1237.
Liégeois-Chauvel, C.、Peretz, I.、Babai, M.、Laguitton, V. 和 Chauvel, P. (1998)。颞叶不同皮层区域对音乐处理的贡献。大脑, 121:1853–1867。
Liégeois-Chauvel, C., Peretz, I., Babai, M., Laguitton, V., & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes to music processing. Brain, 121:1853–1867.
Liljencrants, L., & Lindblom, B. (1972)。元音质量系统的数值模拟:感知对比的作用,语言, 48:839-862。
Liljencrants, L., & Lindblom, B. (1972). Numerical simulations of vowel quality systems: The role of perceptual contrast, Language, 48: 839–862.
Lindblom, B. (1990)。解释语音变异:H&H 理论概述。载于:WJ Hardcastle & A. Marchal(编辑),语音制作和语音建模(第 403–439 页)。荷兰多德雷赫特:Kluwer。
Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In: W. J. Hardcastle & A. Marchal (Eds.), Speech Production and Speech Modeling (pp. 403–439). Dordrecht, The Netherlands: Kluwer.
Lively, SE, & Pisoni, DB (1997)。关于原型和语音类别:对语音感知中的感知磁体效应的批判性评估。实验心理学杂志:人类感知和表现, 23:1665-79。
Lively, S. E., & Pisoni, D. B. (1997). On prototypes and phonetic categories: A critical assessment of the perceptual magnet effect in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 23:1665–79.
Lochy, A.、Hyde, KL、Parisel, S.、Van Hyfte, S. 和 Peretz, I. (2004)。先天性失乐症言语韵律的辨别。海报在 2004 年旧金山认知神经科学学会会议上发表。
Lochy, A., Hyde, K. L., Parisel, S., Van Hyfte, S., & Peretz, I. (2004). Discrimination of speech prosody in congenital amusia. Poster presented at the 2004 meeting of the Cognitive Neuroscience Society, San Francisco.
D. 洛克 (1982)。南方母羊舞蹈鼓乐中另类节奏和交叉节奏的原则。民族音乐学, 26:217-46。
Locke, D. (1982). Principles of offbeat timing and cross-rhythm in Southern Ewe dance drumming. Ethnomusicology, 26: 217–46.
D. 洛克 (1990)。Drum Damba:会说话的鼓课。里诺,内华达州:White Cliffs Media。
Locke, D. (1990). Drum Damba: Talking Drum Lessons. Reno, NV: White Cliffs Media.
Locke, D., & Agbeli, GK (1980)。Adzogbo 鼓语研究。国际非洲音乐图书馆杂志, 6:32–51。
Locke, D., & Agbeli, G. K. (1980). A study of the drum language in Adzogbo. Journal of the International Library of African Music, 6:32–51.
洛克,JL (1993)。孩子的口语之路。马萨诸塞州剑桥市:哈佛大学出版社。
Locke, J. L. (1993). The Child’s Path to Spoken Language. Cambridge, MA: Harvard University Press.
Lockhead, GR, & Byrd, R. (1981)。几乎完美的音调。美国声学学会杂志, 70:387–389。
Lockhead, G. R., & Byrd, R. (1981). Practically perfect pitch. Journal of the Acoustical Society of America, 70:387–389.
Löfqvist, A.、Baer, T.、McGarr, NS 和 Story, RS (1989)。发声控制中的环状甲状腺肌。美国声学学会杂志, 85:1314–1321。
Löfqvist, A., Baer, T., McGarr, N. S., & Story, R. S. (1989). The cricothryroid muscle in voicing control. Journal of the Acoustical Society of America, 85:1314–1321.
伦敦,J. (2002)。公制系统的认知限制:一些观察和假设。音乐感知, 19:529–550。
London, J. (2002). Cognitive constraints on metric systems: Some observations and hypotheses. Music Perception, 19:529–550.
伦敦,J.(2004 年)。及时听到:音乐节拍的心理方面。纽约:牛津大学出版社。
London, J. (2004). Hearing in Time: Psychological Aspects of Musical Meter. New York: Oxford University Press.
伦敦,JM(1995 年)。复杂仪表的一些示例及其对度量感知模型的影响。音乐感知, 13:59–78。
London, J. M. (1995). Some examples of complex meters and their implications for models of metric perception. Music Perception, 13:59–78.
Long, KD, Kennedy, G., & Balaban, E. (2001)。通过种间脑移植转移天生的听觉感知倾向。美国国家科学院院刊, 98:5862–5867。
Long, K. D., Kennedy, G., & Balaban, E. (2001). Transferring an inborn auditory perceptual predisposition with interspecies brain transplants. Proceedings of the National Academy of Sciences, USA, 98:5862–5867.
Long, KD、Kennedy, G.、Salbaum, M. 和 Balaban, E. (2002)。听觉刺激引起的与先天知觉倾向相关的即刻早期基因表达的变化。比较生理学杂志 A:感觉、神经和行为生理学, 188:25-38。
Long, K. D., Kennedy, G., Salbaum, M., & Balaban, E. (2002). Auditory stimulus-induced changes in immediate-early gene expression related to an inborn perceptual predisposition. Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 188:25–38.
R. 朗埃克 (1952)。Trique 中的五个音位音高级别。语言学报7:62–82。
Longacre, R. (1952). Five phonemic pitch levels in Trique. Acta Linguistica 7:62–82.
Longhi, E. (2003)。音乐背景下母婴互动的时间结构。博士 论文,爱丁堡大学,苏格兰。
Longhi, E. (2003). The Temporal Structure of Mother-Infant Interactions in Musical Contexts. Ph.D. dissertation, University of Edinburgh, Scotland.
Lotto, AJ、Kluender, KR 和 Holt, LL (1998)。去极化知觉磁铁效应。美国声学学会杂志, 103:3648–3655。
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1998). Depolarizing the perceptual magnet effect. Journal of the Acoustical Society of America, 103:3648–3655.
Low, EL, Grabe, E., & Nolan, F. (2000)。语音节奏的定量特征:新加坡英语中的音节计时。语言和演讲, 43:377–401。
Low, E. L., Grabe, E., & Nolan, F. (2000). Quantitative characterisations of speech rhythm: Syllable-timing in Singapore English. Language and Speech, 43:377–401.
Luce, PA, & Lyons, EA (1998)。口语记忆表征的特异性。记忆与认知, 26:708–715。
Luce, P. A., & Lyons, E. A. (1998). Specificity of memory representations for spoken words. Memory and Cognition, 26:708–715.
宾夕法尼亚州卢斯和康涅狄格州麦克伦南 (2005)。口语识别:变化的挑战。载于:DB Pisoni 和 RE Remez(编辑),语音感知手册(第 591-609 页)。马萨诸塞州马尔登:布莱克威尔。
Luce, P. A., & McLennan, C. T. (2005). Spoken word recognition: The challenge of variation. In: D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception (pp. 591–609). Malden, MA: Blackwell.
Luria, AR、Tsvetkova, LS 和 Futer, DS (1965)。作曲家的失语症。神经科学杂志, 2:288–292。
Luria, A. R., Tsvetkova, L. S., & Futer, D. S. (1965). Aphasia in a composer. Journal of the Neurological Sciences, 2:288–292.
Lynch, ED, Lee, MK, Morrow, JE, Welcsh, PL, Leon, PE, & King, MC (1997)。与果蝇基因diaphanous的人类同系物突变相关的非综合征性耳聋 DFNA1 。科学, 278:1315–1318。
Lynch, E. D., Lee, M. K., Morrow, J. E., Welcsh, P. L., Leon, P. E., & King, M. C. (1997). Nonsyndromic deafness DFNA1 associated with mutation of a human homolog of the Drosophila gene diaphanous. Science, 278:1315–1318.
Lynch, MP, & Eilers, RE (1992)。音乐调音的知觉发展研究。感知与心理物理学, 52:599-608。
Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning. Perception and Psychophysics, 52:599–608.
Lynch, MP、Eilers, RE、Oller, K. 和 Urbano, RC (1990)。先天性、经验和音乐感知。心理科学, 1:272–276。
Lynch, M. P., Eilers, R. E., Oller, K., & Urbano, R. C. (1990). Innateness, experience, and music perception. Psychological Science, 1:272–276.
Lynch, MP, Short, LB, & Chua, R. (1995)。经验对婴儿期音乐处理发展的贡献。发育心理生物学, 28:377–398。
Lynch, M. P., Short, L. B., & Chua, R. (1995). Contributions of experience to the development of musical processing in infancy. Developmental Psychobiology, 28:377–398.
MacDonald, MC, & Christiansen, MH (2002)。重新评估工作记忆:评论 Just 和 Carpenter (1992) 以及 Waters 和 Caplan (1996)。心理评论, 109:35-54。
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109:35–54.
MacLarnon, A., & Hewitt, G. (1999)。人类语言的进化:增强呼吸控制的作用。美国体质人类学杂志, 109:341-363。
MacLarnon, A., & Hewitt, G. (1999). The evolution of human speech: The role of enhanced breathing control. American Journal of Physical Anthropology, 109:341–363.
Maddieson, I. (1978)。音调的普遍性。载于:J. Greenberg(主编),Universals of Language,第 2 卷:音系学(第 335-365 页)。加利福尼亚州斯坦福:斯坦福大学出版社。
Maddieson, I. (1978). Universals of tone. In: J. Greenberg (Ed.), Universals of Language, Vol 2: Phonology (pp. 335–365). Stanford, CA: Stanford University Press.
Maddieson, I. (1984)。声音的模式。英国剑桥:剑桥大学出版社。
Maddieson, I. (1984). Patterns of Sounds. Cambridge, UK: Cambridge University Press.
Maddieson, I. (1991)。音间距。约克语言学论文, 15:149-175。
Maddieson, I. (1991). Tone spacing. York Papers in Linguistics, 15:149–175.
Maddieson, I. (1999)。寻找普遍性。第 14 届国际语音科学大会论文集,旧金山,第 2521–2528 页。
Maddieson, I. (1999). In search of universals. Proceedings of the14th International Congress of Phonetic Sciences, San Francisco, pp. 2521–2528.
Maddieson, I. (2005)。语气。载于:M. Haspelmath、MS Dryer、D. Gil 和 B. Comne(编辑)。语言结构世界地图集(第 58-61 页)。纽约:牛津大学出版社。
Maddieson, I. (2005). Tone. In: M. Haspelmath, M. S. Dryer, D. Gil, & B. Comne (Eds.). The World Atlas of Language Structures (pp. 58–61). New York: Oxford University Press.
Maess, B.、Koelsch, S.、Gunter, T. 和 Friederici, AD (2001)。音乐句法在 Broca 区处理:一项 MEG 研究。自然神经科学, 4:540–545。
Maess, B., Koelsch, S., Gunter, T., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience, 4:540–545.
Magne, C.、Schön, D. 和 Besson, M. (2006)。音乐家的孩子比非音乐家的孩子更能发现音乐和语言中的音高违规:行为和电生理学方法。认知神经科学杂志, 18:199-211。
Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electro- physiological approaches. Journal of Cognitive Neuroscience, 18:199–211.
Marcus, GF, & Fisher, SE (2003)。FOXP2 聚焦:基因能告诉我们关于言语和语言的哪些信息?认知科学趋势, 7:257–262。
Marcus, G. F., & Fisher, S. E. (2003). FOXP2 in focus: What can genes tell us about speech and language? Trends in Cognitive Sciences, 7:257–262.
Marcus, M., & Hindle, D. (1990)。描述理论和语调边界。在:GTM Altmann,(编辑)。语音处理的认知模型(第 483-512 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Marcus, M., & Hindle, D. (1990). Description theory and intonational boundaries. In: G. T. M. Altmann, (Ed.). Cognitive Models of Speech Processing (pp. 483–512). Cambridge, MA: MIT Press.
马林、OSM 和佩里,DW (1999)。音乐感知和表演的神经学方面。载于:D. Deutsch(主编),音乐心理学(第 2 版,第 653-724 页)。加利福尼亚州圣地亚哥:学术出版社。
Marin, O. S. M., & Perry, D. W. (1999). Neurological aspects of music perception and performance. In: D. Deutsch (Ed.), The Psychology of Music (2nd ed., pp. 653–724). San Diego, CA: Academic Press.
马勒,P.(1970 年)。声乐学习的比较方法:白冠麻雀的歌曲发展。比较和生理心理学杂志, 71:1-25。
Marler, P. (1970). A comparative approach to vocal learning: Song development in white-crowned sparrows. Journal of Comparative and Physiological Psychology, 71:1–25.
马勒,P. (1991)。歌曲学习行为:与神经行为学的接口。神经科学趋势, 14:199-206。
Marler, P. (1991). Song-learning behavior: The interface with neuroethology. Trends in Neurosciences, 14:199–206.
Marler, P. (1997)。歌曲学习的三种模式:来自行为的证据。神经生物学杂志, 33:501-616。
Marler, P. (1997). Three models of song learning: Evidence from behavior. Journal of Neurobiology, 33:501–616.
Marler, P. (1999)。关于先天:雀歌是“后天”还是“后天”?载于:MD Hauser & M. Konishi(编),动物交流的设计(第 293-318 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Marler, P. (1999). On innateness: Are sparrow songs “learned” or “innate”? In: M. D. Hauser & M. Konishi (Eds.), The Design of Animal Communication (pp. 293–318). Cambridge, MA: MIT Press.
Marler, P. (2000)。音乐和语言的起源:来自动物的见解。载于:NL Wallin、B. Merker 和 S. Brown,Eds。音乐的起源。马萨诸塞州剑桥市:麻省理工学院出版社(第 31-48 页)。
Marler, P. (2000). Origins of music and speech: Insights from animals. In: N. L. Wallin, B. Merker, & S. Brown, Eds. The Origins of Music. Cambridge, MA: MIT Press (pp. 31–48).
Marler, P., & Peters, S. (1977)。麻雀的选择性发声学习。科学, 198:519521。
Marler, P., & Peters, S. (1977). Selective vocal learning in a sparrow. Science, 198:519521.
Marslen-Wilson, W. (1975)。句子感知是一个交互式的并行过程。科学, 189:382-386。
Marslen-Wilson, W. (1975). Sentence perception as an interactive parallel process. Science, 189:382–386.
马丁,JG(1972 年)。言语和其他行为中的节奏(等级)结构与序列结构。心理评论, 79:487-509。
Martin, J. G. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review, 79:487–509.
Marvin, E., & Brinkman, A. (1999)。调制和形式操作对音调闭合感知的影响。音乐感知, 16:389–408。
Marvin, E., & Brinkman, A. (1999). The effect of modulation and formal manipulation on perception of tonal closure. Music Perception, 16:389–408.
Masataka, N. (1999)。耳聋父母 2 天大的听力婴儿对婴儿定向歌唱的偏好。发展心理学, 35:1001-1005。
Masataka, N. (1999). Preference for infant-directed singing in 2-day-old hearing infants of deaf parents. Developmental Psychology, 35:1001–1005.
Masataka, N. (2006)。聋人父母和健听父母的听力新生儿偏爱和谐而不是不和谐。发展科学, 9:46–50。
Masataka, N. (2006). Preference for consonance over dissonance by hearing newborns of deaf parents and of hearing parents. Developmental Science, 9:46–50.
Mason, RA, & Just, MA (2004)。大脑如何处理文本中的因果推论。心理科学, 15:1-7。
Mason, R. A., & Just, M. A. (2004). How the brain processes causal inferences in text. Psychological Science, 15:1–7.
Matell, MS, & Meck, WH (2000)。间隔计时行为的神经心理学机制。BioEssays, 22:94–103。
Matell, M. S., & Meck, W. H. (2000). Neuropsychological mechanisms of interval timing behaviour. BioEssays, 22:94–103.
马夫洛夫,L.(1980 年)。由于左半球损伤的音乐家的节奏失认症导致的失乐症:一种非听觉超模态缺陷。皮质, 16:331-338。
Mavlov, L. (1980). Amusia due to rhythm agnosia in a musician with left hemisphere damage: A non-auditory supramodal defect. Cortex, 16:331–338.
Mayberry, RI, & Eichen, E. (1991)。童年学习手语的持久优势:再看语言习得的关键期。记忆与语言杂志, 30:486-512。
Mayberry, R. I., & Eichen, E. (1991). The long-lasting advantage of learning sign-language in childhood: Another look at the critical period for language acquisition. Journal of Memory and Language, 30:486–512.
Mayberry, RI, & Lock, E. (2003)。第一语言与第二语言习得的年龄限制:语言可塑性和表观遗传的证据。大脑和语言, 87:369–383。
Mayberry, R. I., & Lock, E. (2003). Age constraints on first versus second language acquisition: Evidence for linguistic plasticity and epigenesis. Brain and Language, 87:369–383.
Maye, J., & Weiss, DJ (2003)。统计线索有助于婴儿辨别困难的语音对比。在:B. Beachley 等人。(编辑),BUCLD 27 会议记录(第 508-518 页)。马萨诸塞州萨默维尔:Cascadilla Press。
Maye, J., & Weiss, D. J. (2003). Statistical cues facilitate infants’ discrimination of difficult phonetic contrasts. In: B. Beachley et al. (Eds.), BUCLD 27 Proceedings (pp. 508–518). Somerville, MA: Cascadilla Press.
Maye, J.、Weiss, DJ 和 Aslin, R.(出版中)。婴儿的统计语音学习:促进和特征概括。发展科学。
Maye, J., Weiss, D. J., & Aslin, R. (in press). Statistical phonetic learning in infants: Facilitation and feature generalization. Developmental Science.
Maye, J.、Werker, J. 和 L. Gerken。(2002)。婴儿对分布信息的敏感性会影响语音辨别力。认知, 82:B101-B111。
Maye, J., Werker, J., & L. Gerken. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82:B101-B111.
麦克亚当斯,S. (1996)。试听:音乐的认知心理学。载于:R. Llinas 和 PS Churchland(编辑),心脑连续体:感觉过程(第 251-279 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
McAdams, S. (1996). Audition: Cognitive psychology of music. In: R. Llinas & P. S. Churchland (Eds.), The Mind-Brain Continuum: Sensory Processes (pp. 251–279). Cambridge, MA: MIT Press.
McAdams, S.、Beauchamp, JM 和 Meneguzzi, S. (1999)。用简化的光谱时间参数重新合成的乐器声音的鉴别。美国声学学会杂志, 105:882–897。
McAdams, S., Beauchamp, J. M., & Meneguzzi, S. (1999). Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters. Journal of the Acoustical Society of America, 105:882–897.
McAdams, S., & Cunibile, JC (1992)。音色类比的感知。英国皇家学会哲学汇刊,伦敦,B 辑, 336:383–389。
McAdams, S., & Cunibile, J. C. (1992). Perception of timbral analogies. Philosophical Transactions of the Royal Society, London, Series B, 336:383–389.
McAdams, S., & Matzkin, D. (2001)。相似性、不变性和音乐变化。纽约科学院年鉴, 930:62–76。
McAdams, S., & Matzkin, D. (2001). Similarity, invariance, and musical variation. Annals of the New York Academy of Sciences, 930:62–76.
McAdams, S.、Winsberg, S.、Donnadieu, S.、De Soete, G. 和 Krimphoff, J.(1995 年)。合成音乐音色的感知缩放:共同维度、特性和潜在主题类别。心理学研究, 58:177–192。
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., & Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research, 58:177–192.
McDermott, J., & Hauser, M. (2004)。辅音音程对他们来说是音乐吗?非人类灵长类动物的自发声学偏好。认知, 94:B11-B21。
McDermott, J., & Hauser, M. (2004). Are consonant intervals music to their ears? Spontaneous acoustic preferences in a nonhuman primate. Cognition, 94:B11-B21.
McDermott, J., & Hauser, MD (2005)。音乐的起源:天赋、发展和进化。音乐感知, 23:29–59。
McDermott, J., & Hauser, M. D. (2005). The origins of music: Innateness, development, and evolution. Music Perception, 23:29–59.
McKinney, MF, & Delgutte, B. (1999)。倍频程扩大效应的可能神经生理学基础。美国声学学会杂志, 106:2679–92。
McKinney, M. F., & Delgutte, B. (1999). A possible neurophysiological basis of the octave enlargement effect. Journal of the Acoustical Society of America, 106:2679–92.
麦克卢卡斯,A.(2001 年)。调整家庭。载于:S. Sadie(主编),The New Grove Dictionary of Music and Musicians(第 25 卷,第 882-884 页)。纽约:格罗夫。
McLucas, A. (2001). Tune families. In: S. Sadie (Ed.), The New Grove Dictionary of Music and Musicians (Vol. 25, pp. 882–884). New York: Grove.
McMullen, E., & Saffran, JR (2004)。音乐和语言:发展比较。音乐感知, 21:289–311。
McMullen, E., & Saffran, J. R. (2004). Music and language: A developmental comparison. Music Perception, 21:289–311.
麦克尼尔,WH (1995)。及时保持在一起:人类历史上的舞蹈和训练。马萨诸塞州剑桥市:哈佛大学出版社。
McNeill, W. H. (1995). Keeping Together in Time: Dance and Drill in Human History. Cambridge, MA: Harvard University Press.
McNellis, MG, & Blumstein, SE (2001)。正常人和失语症患者词汇访问的自组织动力学。认知神经科学杂志, 13:151-170。
McNellis, M. G., & Blumstein, S. E. (2001). Self-organizing dynamics of lexical access in normals and aphasics. Journal of Cognitive Neuroscience, 13:151–170.
McQueen, JM、Norris, D. 和 Cutler, A. (2006)。语音感知的动态特性。语言和演讲, 49:101-112。
McQueen, J. M., Norris, D., & Cutler, A. (2006). The dynamic nature of speech perception. Language and Speech, 49:101–112.
Mehler, J.、Dommergues, JY、Frauenfelder, U. 和 Segui, J. (1981)。音节在语音分割中的作用。语言学习和语言行为杂志, 20:298-305。
Mehler, J., Dommergues, J. Y., Frauenfelder, U., & Segui, J. (1981). The syllable’s role in speech segmentation. Journal of Verbal Learning and Verbal Behavior, 20:298–305.
Mehler, J.、Dupuox, E.、Nazzi, T. 和 Dehaene-Lambertz, D. (1996)。应对语言多样性:婴儿的观点。载于:JL Morgan & D. Demuth(编辑),从信号到语法(第 101-116 页)。新泽西州 Mahwah:Lawrence Erlbaum。
Mehler, J., Dupuox, E., Nazzi, T., & Dehaene-Lambertz, D. (1996). Coping with linguistic diversity: The infant’s viewpoint. In: J. L. Morgan & D. Demuth (Eds.), Signal to Syntax (pp. 101–116). Mahwah, NJ: Lawrence Erlbaum.
Mehler, J.、Jusczyk, P.、Lambertz, G.、Halsted, N.、Bertoncini, J. 和 Amiel-Tison, C. (1988)。幼儿语言习得的前兆。认知, 29:143-178。
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., & Amiel-Tison, C. (1988). A precursor to language acquisition in young infants. Cognition, 29:143–178.
Ménard, L., Schwartz, J.-L., & Boe, L.-J. (2004)。声道形态在言语发展中的作用:法国合成元音从出生到成年的感知目标和感觉运动图。言语、语言和听力研究杂志, 47:1059–1080。
Ménard, L., Schwartz, J.-L., & Boe, L.-J. (2004). Role of vocal tract morphology in speech development: Perceptual targets and sensorimotor maps for French synthesized vowels from birth to adulthood. Journal of Speech, Language, and Hearing Research, 47:1059–1080.
Menon, V., & Levitin, DJ (2005)。听音乐的回报:中脑边缘系统的反应和生理连接。神经影像学, 28:175–184。
Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage, 28:175–184.
Menon, V., Levitin, DJ, Smith, BK, Lembke, A., Krasnow, BD, Glazer, D., Glover, GH, & McAdams。S. (2002)。和声中音色变化的神经相关性。神经影像, 17:1742–54。
Menon, V., Levitin, D. J., Smith, B. K., Lembke, A., Krasnow, B. D., Glazer, D., Glover, G. H., & McAdams. S. (2002). Neural correlates of timbre change in harmonic sounds. NeuroImage, 17:1742–54.
Merker, B. (2000)。同步合唱和人类起源。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 315-327 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Merker, B. (2000). Synchronous chorusing and human origins. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 315–327). Cambridge, MA: MIT Press.
Merker, B. (2002)。音乐:缺失的洪堡系统。科学音乐, 1:3–21。
Merker, B. (2002). Music: The missing Humboldt system. Musicae Scientiae, 1:3–21.
Merker, B. (2005)。鸟鸣、音乐和语言中的保形动机:介绍。纽约科学院年鉴, 1060:17-28。
Merker, B. (2005). The conformal motive in birdsong, music, and language: An introduction. Annals of the New York Academy of Sciences, 1060:17–28.
Mertens, P. (2004a)。程序图:基于音调感知模型的韵律半自动转录。载于:B. Bel & I. Marlien(编辑),Proceedings of Speech Prosody 2004,奈良(日本),3 月 23-26 日。
Mertens, P. (2004a). The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. In: B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2004, Nara (Japan), 23–26 March.
Mertens, P. (2004b)。Un outil pour la transcription de la prosodie dans les corpus oraux。Traitement Automatique des Langues, 45:109-130。
Mertens, P. (2004b). Un outil pour la transcription de la prosodie dans les corpus oraux. Traitement Automatique des Langues, 45:109–130.
迈耶,J.(2004 年)。人类口哨语言的生物声学:语言认知过程的另一种方法。Anais da Academia Brasileira de Ciencias [巴西科学院年鉴],76:405–412。
Meyer, J. (2004). Bioacoustics of human whistled languages: An alternative approach to the cognitive processes of language. Anais da Academia Brasileira de Ciencias [Annals of the Brazilian Academy of Sciences], 76:405–412.
迈耶,LB(1956 年)。音乐中的情感和意义。芝加哥:芝加哥大学出版社。
Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.
迈耶,LB(1973 年)。解释音乐:散文和探索。伯克利:加州大学出版社。
Meyer, L. B. (1973). Explaining Music: Essays and Explorations. Berkeley: University of California Press.
米勒,G. (2000)。通过性选择进化人类音乐。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 329–360 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Miller, G. (2000). Evolution of human music through sexual selection. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 329–360). Cambridge, MA: MIT Press.
乔治亚州米勒 (1956)。神奇的数字七,加上或减去二:对我们处理信息的能力的一些限制。心理评论, 63:81-97。
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63:81–97.
LK 米勒 (1989)。音乐专家:智障人士的非凡技能。新泽西州希尔斯代尔:Erlbaum。
Miller, L. K. (1989). Musical Savants: Exceptional Skill in the Mentally Retarded. Hillsdale, NJ: Erlbaum.
Miranda, RA 和 Ullman, MT(出版中)。音乐中规则与记忆的双重分离:一项与事件相关的潜在研究。神经影像。
Miranda, R. A., & Ullman, M. T. (in press). Double dissociation between rules and memory in music: An event-related potential study. NeuroImage.
Mithen, S. (2005)。唱歌的尼安德特人:音乐、语言、思想和身体的起源。伦敦:Weidenfeld & Nicolson。
Mithen, S. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body. London: Weidenfeld & Nicolson.
Monelle, R. (2000)。音乐感:符号学论文。新泽西州普林斯顿:普林斯顿大学出版社。
Monelle, R. (2000). The Sense of Music: Semiotic Essays. Princeton, NJ: Princeton University Press.
Moon, C.、Cooper, RP 和 Fifer, WP (1993)。两天大的婴儿更喜欢他们的母语。婴儿行为和发展, 16:495–500。
Moon, C., Cooper, R. P., & Fifer, W. P. (1993). Two-day-olds prefer their native language. Infant Behavior and Development, 16:495–500.
摩尔,BCJ(1997 年)。与语音感知相关的听觉处理方面。载于:WJ Hardcastle 和 J. Laver(编辑),语音科学手册(第 539-565 页)。英国牛津:布莱克威尔。
Moore, B. C. J. (1997). Aspects of auditory processing related to speech perception. In: W. J. Hardcastle & J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 539–565). Oxford, UK: Blackwell.
摩尔,BCJ(2001 年)。响度、音调和音色。载于:EB Goldstein(主编),Blackwell Handbook of Perception(第 408-436 页)。马萨诸塞州马尔登:布莱克威尔。
Moore, B. C. J. (2001). Loudness, pitch, and timbre. In: E. B. Goldstein (Ed.), Blackwell Handbook of Perception (pp. 408–436). Malden, MA: Blackwell.
Moore, JW, Choi, J.-S., & Brunzell, DH (1998)。时间不确定性下的预测时间:条件响应的时间导数模型。载于:DA Rosenbaum 和 CE Collyer(编辑),行为时机:神经、心理和计算视角(第 3-34 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Moore, J. W., Choi, J.-S., & Brunzell, D. H. (1998). Predictive timing under temporal uncertainty: The time derivative model of the conditioned response. In: D. A. Rosenbaum & C. E. Collyer (Eds.), Timing of Behavior: Neural, Psychological, and Computational Perspectives (pp. 3–34). Cambridge, MA: MIT Press.
Morgan, JL, Meier, RP, & Newport, EL (1987)。语言学习输入中的结构包装:短语的韵律和词法标记在语言习得中的贡献。认知心理学, 19:498–550。
Morgan, J. L., Meier, R. P., & Newport, E. L. (1987). Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases in the acquisition of language. Cognitive Psychology, 19:498–550.
Morley, I. (2003)。音乐的进化起源和考古学:对人类音乐能力和行为史前史的调查。博士 论文,剑桥大学。
Morley, I. (2003). The Evolutionary Origins and Archaeology of Music: An Investigation into the Prehistory of Human Musical Capacities and Behaviours. Ph.D. dissertation, University of Cambridge.
Morrongiello, 文学学士 (1984)。6 个月和 12 个月大婴儿的听觉时间模式感知。发展心理学, 20:441–448。
Morrongiello, B. A. (1984). Auditory temporal pattern perception in 6- and 12-month-old infants. Developmental Psychology, 20:441–448.
Morrongiello, BA, & Trehub, SE (1987)。听觉时间感知的年龄相关变化。实验儿童心理学杂志, 44:413–426。
Morrongiello, B. A., & Trehub, S. E. (1987). Age-related changes in auditory temporal perception. Journal of Experimental Child Psychology, 44:413–426.
D. 莫顿 (1974)。泰国传统音乐中的声调。民族音乐学研究所报告选集, 2:166–175。
Morton, D. (1974). Vocal tones in traditional Thai music. Selected Reports of the Institute for Ethnomusicology, 2:166–175.
Morton, J.、Marcus, S. 和 Frankish, C. (1976)。知觉中心(P中心)。心理评论, 83:405-408。
Morton, J., Marcus, S., & Frankish, C. (1976). Perceptual centers (P centers). Psychological Review, 83:405–408.
Most, T., Amir, O., & Tobin, Y. (2000)。希伯来语元音:原始和规范化的声学数据。语言和演讲, 43:295-308。
Most, T., Amir, O., & Tobin, Y. (2000). The Hebrew vowels: Raw and normalized acoustic data. Language and Speech, 43:295–308.
Münte, TF, Altenmüller, E., & Jäncke, L. (2002)。音乐家的大脑作为神经可塑性的模型。自然评论(神经科学), 3:473–478。
Münte, T. F., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity. Nature Reviews (Neuroscience), 3:473–478.
迈尔斯 R. (1968)。(编辑)。理查施特劳斯和罗曼罗兰。伦敦:Calder & Boyars。
Myers, R. (1968). (Ed.). Richard Strauss and Romain Rolland. London: Calder & Boyars.
Näätänen, R. (1992)。注意力和大脑功能。新泽西州希尔斯代尔:Erlbaum。
Näätänen, R. (1992). Attention and Brain Function. Hillsdale, NJ: Erlbaum.
Näätänen, R.、Lehtokoksi, A.、Lennes, M. 等。(1997)。电和磁大脑反应揭示的语言特定音素表示。自然, 385:432–434。
Näätänen, R., Lehtokoksi, A., Lennes, M., et al. (1997). Language-specific phoneme representations revealed by electrical and magnetic brain responses. Nature, 385:432–434.
Näätänen, R., & Winker, I. (1999)。认知神经科学中听觉刺激表征的概念。心理公报, 125:826-859。
Näätänen, R., & Winker, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125:826–859.
Nadel, J.、Carchon, I.、Kervella, C.、Marcelli, D. 和 Réserbat-Plantey, D. (1999)。2 个月大的社会意外事件的期望。发展科学, 2:164–173。
Nadel, J., Carchon, I., Kervella, C., Marcelli, D., & Réserbat-Plantey, D. (1999). Expectancies for social contingency in 2-month-olds. Developmental Science, 2:164–173.
Nakata, T., & Trehub, SE (2004)。婴儿对母亲说话和唱歌的反应。婴儿行为与发展, 27:455–464。
Nakata, T., & Trehub, S. E. (2004). Infants’ responsiveness to maternal speech and singing. Infant Behavior and Development, 27:455–464.
Narayanan, S., & Jurafsky, D. (1998)。人类句子处理的贝叶斯模型。在认知科学学会第十二届年会的记录中。
Narayanan, S., & Jurafsky, D. (1998). Bayesian models of human sentence processing. In Proceedings of the Twelfth Annual Meeting of the Cognitive Science Society.
Narayanan, S., & Jurafsky, D. (2002)。贝叶斯模型预测人类在句子处理中的解析偏好和阅读时间。神经信息处理系统的进展, 14:59–65。
Narayanan, S., & Jurafsky, D. (2002). A Bayesian model predicts human parse preference and reading time in sentence processing. Advances in Neural Information Processing Systems, 14:59–65.
Narmour, E. (1990)。基本旋律结构的分析与认知:蕴涵-实现模型。芝加哥:芝加哥大学出版社。
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. Chicago: University of Chicago Press.
JJ 纳蒂兹 (1990)。音乐与话语:迈向音乐符号学(C. Abate, Trans.)。新泽西州普林斯顿:普林斯顿大学出版社。
Nattiez, J. J. (1990). Music and Discourse: Toward a Semiology of Music (C. Abate, Trans.). Princeton, NJ: Princeton University Press.
JJ 纳蒂兹 (2003)。La signification come parametre 音乐剧。收录于:JJ Nattiez (主编)音乐:Une Encyclopédie pour Le XXIe Siecle(第 2 卷,第 256-289 页)。法国阿尔勒:Acte Sud/Cité de la Musique。
Nattiez, J. J. (2003). La signification comme parametre musical. In: J. J. Nattiez (Ed.)Musique: Une Encyclopédie pour Le XXIe Siecle (Vol. 2., pp. 256–289). Arles, France: Acte Sud/Cité de la Musique.
Nazzi, T.、Bertoncini, J. 和 Mehler, J. (1998)。新生儿的语言歧视:理解节奏的作用。实验心理学杂志:人类感知和表现, 24:756–777。
Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination in newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance, 24:756–777.
Nazzi, T., Jusczyk, PW, & Johnson, EK (2000)。5 个月大的英语学习者的语言歧视:节奏和熟悉度的影响。记忆和语言杂志, 43:1-19。
Nazzi, T., Jusczyk, P. W., & Johnson, E. K. (2000). Language discrimination by English learning 5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language, 43:1–19.
Nearey, T. (1978)。元音的语音特征。布卢明顿:印第安纳大学语言学俱乐部。
Nearey, T. (1978). Phonetic Features for Vowels. Bloomington: Indiana University Linguistics Club.
Nespor, M. (1990)。关于音韵学中的节奏参数。在:I. Rocca(主编),语言习得中的逻辑问题(第 157-175 页)。荷兰多德雷赫特:Foris 出版社。
Nespor, M. (1990). On the rhythm parameter in phonology. In: I. Rocca (Ed.), Logical Issues in Language Acquisition (pp. 157–175). Dordrecht, The Netherlands: Foris Publications.
Nespor, M., & Vogel, I. (1983)。单词上方的韵律结构。在:A. Cutler & DR Ladd(编辑),韵律:模型和测量。德国柏林:施普林格出版社。
Nespor, M., & Vogel, I. (1983). Prosodic structure above the word. In: A. Cutler & D. R. Ladd (Eds.), Prosody: Models and Measurements. Berlin, Germany: Springer-Verlag.
Nespor, M., & Vogel, I. (1989)。关于冲突和失误。音系学, 6:69–116。Nettl, B. (1954)。北美印第安音乐风格。美国民俗学会回忆录(第 45 卷)。费城:美国民俗学会。
Nespor, M., & Vogel, I. (1989). On clashes and lapses. Phonology, 6:69–116. Nettl, B. (1954). North American Indian musical styles. Memoirs of the American Folklore Society (Vol. 45). Philadelphia: American Folklore Society.
Nettl, B. (2000)。民族音乐学家思考音乐声音和音乐文化中的普遍性。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 463-472 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and musical culture. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 463–472). Cambridge, MA: MIT Press.
Neubauer, J. (1986)。音乐从语言中解放出来:从十八世纪美学中的模仿出发。康涅狄格州纽黑文:耶鲁大学出版社。
Neubauer, J. (1986). The Emancipation of Music From Language: Departure From Mimesis in Eighteenth-Century Aesthetics. New Haven, CT: Yale University Press.
E. 纽波特 (2002)。语言发展,关键时期。载于:L. Nadel(主编),认知科学百科全书(第 737-740 页)。伦敦:MacMillan Publishers Ltd./Nature Group。
Newport, E. (2002). Language development, critical periods in. In: L. Nadel (Ed.), Encyclopedia of Cognitive Science (pp. 737–740). London: MacMillan Publishers Ltd./Nature Group.
Newport, EL、Hauser, MD、Spaepen, G. 和 Aslin, RN (2004)。远程学习:II。非人类灵长类动物中非相邻依赖关系的统计学习。认知心理学, 49:85–117。
Newport, E. L., Hauser, M. D., Spaepen, G., & Aslin, R. N. (2004). Learning at a distance: II. Statistical learning of non-adjacent dependencies in a non-human primate. Cognitive Psychology, 49:85–117.
Nicholson, KG、Baum, S.、Kilgour, A.、Koh, CK、Munhall, KG 和 Cuddy, LL (2003)。右半球损伤后韵律和音乐模式的处理受损。大脑与认知, 52:382–389。
Nicholson, K. G., Baum, S., Kilgour, A., Koh, C. K., Munhall, K. G., & Cuddy, L. L. (2003). Impaired processing of prosodic and musical patterns after right hemisphere damage. Brain and Cognition, 52:382–389.
Nielzén, S., & Cesarec, Z. (1981)。论音乐中情感意义的感知。音乐心理学, 9:17–31。
Nielzén, S., & Cesarec, Z. (1981). On the perception of emotional meaning in music. Psychology of Music, 9:17–31.
Nketia, KJH (1974)。非洲音乐。纽约:WW 诺顿。
Nketia, K. J. H. (1974). The Music of Africa. New York: W. W. Norton.
Noad, MJ, Cato, DH, Bryden, MM, Jenner, M.-N., & Jenner, KCS (2000)。鲸歌文化大革命。自然, 408:537。
Noad, M. J., Cato, D. H., Bryden, M. M., Jenner, M.-N., & Jenner, K. C. S. (2000). Cultural revolution in whale songs. Nature, 408:537.
诺兰 (2003)。语调等价:音高音阶的实验评估。第 15 届国际语音科学大会论文集,巴塞罗那,第 771-774 页。
Nolan, F. (2003). Intonational equivalence: An experimental evaluation of pitch scales. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 771–774.
Nord, L.、Krukenberg, A. 和 Fant, G. (1990)。一些散文、诗歌和音乐的时间研究。语音通信, 9:477–483。
Nord, L., Krukenberg, A., & Fant, G. (1990). Some timing studies of prose, poetry and music. Speech Communication, 9:477–483.
North, AC、Hargreaves, DJ 和 McKendrick, J. (1999)。店内音乐对葡萄酒选择的影响。应用心理学杂志, 84:271-276。
North, A. C., Hargreaves, D. J., & McKendrick, J. (1999). The influence of in-store music on wine selections. Journal of Applied Psychology, 84:271–276.
North, AC, Hargreaves, DJ, & O'Neill, SA (2000)。音乐对青少年的重要性。英国教育心理学杂志, 70:255–272。
North, A. C., Hargreaves, D. J., & O’Neill, S. A. (2000). The importance of music to adolescents. British Journal of Educational Psychology, 70:255–272.
Norton, A.、Winner, E.、Cronin, K.、Overy, K.、Lee, DJ 和 Schlaug, G. (2005)。音乐能力是否存在预先存在的神经、认知或运动标记?大脑与认知, 59:124-134。
Norton, A., Winner, E., Cronin, K., Overy, K., Lee, D. J., & Schlaug, G. (2005). Are there pre-existing neural, cognitive, or motoric markers for musical ability? Brain and Cognition, 59:124–134.
Nyklíček, I.、Thayer, JF 和 van Doornen, LJP (1997)。音乐诱发情绪的心肺分化。心理生理学杂志,11:304–321。
Nyklíček, I., Thayer, J. F., & van Doornen, L. J. P. (1997). Cardiorespiratory differentiation of musically-induced emotions. Journal of Psychophysiology, 11:304–321.
Ochs, E., & Schieffelin, B. (1984)。语言习得和社会化。载于:RA Shweder 和 RA Levine(编辑),文化理论:关于心灵、自我和情感的论文(第 276-320 页)。纽约:剑桥大学出版社。
Ochs, E., & Schieffelin, B. (1984). Language acquisition and socialization. In: R. A. Shweder & R. A. Levine (Eds.), Culture Theory: Essays on Mind, Self, and Emotion (pp. 276–320). New York: Cambridge University Press.
哦,JS,Jun,S.-A.,Knightly,LM,& Au,TK-F。(2003)。坚持童年的语言记忆。认知, 86:53-64。
Oh, J. S., Jun, S.-A., Knightly, L. M., & Au, T. K.-F. (2003). Holding on to childhood language memory. Cognition, 86:53–64.
奥哈拉,JJ (1983)。音高的跨语言使用:一种行为学观点。语音学, 40:1-18。
Ohala, J. J. (1983). Cross-language use of pitch: An ethological view. Phonetica, 40:1–18.
奥哈拉,JJ (1984)。语音 F0 常见跨语言利用的行为学视角。语音学, 41:1-16。
Ohala, J. J. (1984). An ethological perspective on common cross-language utilization of F0 of voice. Phonetica, 41:1–16.
奥哈拉,JJ (1994)。频率代码是语音音高的声音符号使用的基础。载于:L. Hinton、J. Nichols 和 J. Ohala(编),声音象征主义(第 325-347 页)。英国剑桥:剑桥大学出版社。
Ohala, J. J. (1994). The frequency code underlies the sound-symbolic use of voice pitch. In: L. Hinton, J. Nichols, & J. Ohala (Eds.), Sound Symbolism (pp. 325–347). Cambridge, UK: Cambridge University Press.
Ohgushi, K. (2002)。日本和西方钢琴家的点式节奏表达比较。在:C.史蒂文等人。(编辑),第七届国际音乐感知和认知会议论文集,悉尼(第 250-253 页)。澳大利亚阿德莱德:Causal Productions。
Ohgushi, K. (2002). Comparison of dotted rhythm expression between Japanese and Western pianists. In: C. Steven et al. (Eds.), Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney (pp. 250–253). Adelaide, Australia: Causal Productions.
Ohl, FW、Schulze, H.、Scheich, H. 和 Freeman, WJ (2000)。硬膜外皮层电图显示沙鼠听觉皮层调频音调的空间表征。生理学杂志-巴黎, 94:549-554。
Ohl, F. W., Schulze, H., Scheich, H., & Freeman, W. J. (2000). Spatial representation of frequency-modulated tones in gerbil auditory cortex revealed by epidural electro- corticography. Journal of Physiology-Paris, 94:549- 554.
Oller, K., & Eilers, RE (1988)。试听在婴儿牙牙学语中的作用。儿童发展, 59:441–466。
Oller, K., & Eilers, R. E. (1988). The role of audition in infant babbling. Child Development, 59:441–466.
Oohashi, T.、Kawai, N.、Honda, M.、Nakamura, S.、Morimoto, M.、Nishina, E. 和 Maekawa, T. (2002)。现场占有恍惚的脑电图测量。临床神经生理学, 113:435-45。
Oohashi, T., Kawai, N., Honda, M., Nakamura, S., Morimoto, M., Nishina, E., & Maekawa, T. (2002). Electroencephalographic measurement of possession trance in the field. Clinical Neurophysiology, 113:435–45.
Oram, N., & Cuddy, LL (1995)。西方成年人对旋律序列中音高分布信息的反应。心理学研究, 57:103–18。
Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional information in melodic sequences. Psychological Research, 57:103–18.
A. 奥托尼和 TJ 特纳 (1990)。基本情绪的基本原理是什么?心理评论, 97:315-331。
Ortony, A., & Turner, T. J. (1990). What’s basic about basic emotions? Psychological Review, 97:315–331.
Osterhout, L., & Holcomb, PJ (1992)。句法异常引起的事件相关电位。记忆与语言杂志, 31:785-806。
Osterhout, L., & Holcomb, P. J. (1992). Event-related potentials elicited by syntactic anomaly. Journal of Memory and Language, 31:785–806.
Osterhout, L., & Holcomb, PJ (1993)。事件相关电位和句法异常:连续语音感知过程中异常检测的证据。语言和认知过程, 8:413–437。
Osterhout, L., & Holcomb, P. J. (1993). Event-related potential and syntactic anomaly: Evidence of anomaly detection during the perception of continuous speech. Language and Cognitive Processes, 8:413–437.
Otake, T.、Hatano, G.、Cutler, A. 和 Mehler, J. (1993)。莫拉还是音节?日语语音分割。记忆与语言杂志, 32:258-278。
Otake, T., Hatano, G., Cutler, A., & Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32:258–278.
Overy, K. (2003)。阅读障碍和音乐:从时间缺陷到音乐干预。纽约科学院年鉴, 999:497–505。
Overy, K. (2003). Dyslexia and music: From timing deficits to musical intervention. Annals of the New York Academy of Sciences, 999:497–505.
Paavilainen, P.、Jaramillo, M.、Näätänen, R. 和 Winkler, I. (1999)。人脑中的神经元群体从声学方差中提取不变关系。神经科学快报二百六十五:179—182。
Paavilainen, P., Jaramillo, M., Näätänen, R., & Winkler, I. (1999). Neuronal populations in the human brain extracting invariant relationships from acoustic variance. Neuroscience Letters 265:179–182.
Pallier, C.、Sebastian-Gallés, N.、Felguera, T.、Christophe, A. 和 Mehler, J. (1993)。口语的音节结构内的注意分配。记忆与语言杂志, 32:373-389。
Pallier, C., Sebastian-Gallés, N., Felguera, T., Christophe, A., & Mehler, J. (1993). Attentional allocation within syllabic structure of spoken words. Journal of Memory and Language, 32:373–389.
C. 帕默 (1996)。论音乐表演中的结构分配。音乐感知, 14:23–56。
Palmer, C. (1996). On the assignment of structure in music performance. Music Perception, 14:23–56.
C. 帕默 (1997)。音乐表演。心理学年度回顾, 48:115–138。
Palmer, C. (1997). Music performance. Annual Review of Psychology, 48:115–138.
Palmer, C.、Jungers, M. 和 Jusczyk, P. (2001)。音乐韵律的情景记忆。记忆与语言杂志, 45:526-545。
Palmer, C., Jungers, M., & Jusczyk, P. (2001). Episodic memory for musical prosody. Journal of Memory and Language, 45:526–545.
Palmer, C., & Kelly, MH (1992)。歌曲中的语言韵律和韵律。记忆与语言杂志, 31:525-542。
Palmer, C., & Kelly, M. H. (1992). Linguistic prosody and musical meter in song. Journal of Memory and Language, 31:525–542.
Palmer, C., & Krumhansl, CL (1990)。音乐节拍的心理表征。实验心理学杂志:人类感知和表现, 16:728–741。
Palmer, C., & Krumhansl, C. L. (1990). Mental representations for musical meter. Journal of Experimental Psychology: Human Perception and Performance, 16:728–741.
Palmer, C., & Pfordresher, PQ (2003)。顺序生产中的增量计划。心理评论, 110:683-712。
Palmer, C., & Pfordresher, P. Q. (2003). Incremental planning in sequence production. Psychological Review, 110:683–712.
Panksepp, J. (1998)。情感神经科学:人类和动物情感的基础。纽约:牛津大学出版社。
Panksepp, J. (1998). Affective Neuroscience: The Foundations of Human and Animal Emotions. New York: Oxford University Press.
Pannekamp, A.、Toepel, U.、Alter, K.、Hahne, A.、Friederici, AD (2005)。韵律驱动的句子处理:一项 ERP 研究。认知神经科学杂志, 17:407–421。
Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A. D. (2005). Prosody driven sentence processing: An ERP study. Journal of Cognitive Neuroscience,17:407–421.
Pantaleoni, H. (1985)。关于音乐的本质。纽约奥尼昂塔:Welkin Books。
Pantaleoni, H. (1985). On the Nature of Music. Oneonta, New York: Welkin Books.
Pantev, C.、Oostenveld, R.、Engelien, A.、Ross, B.、Roberts, LE 和 Hoke, M. (1998)。增加音乐家的听觉皮层代表。自然, 392:811-814。
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature, 392:811–814.
Pantev, C.、Roberts, LE、Schulz, M.、Engelien, A. 和 Ross, B. (2001)。音乐家听觉皮层表征的音色特异性增强。神经报告, 12:169–174。
Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B. (2001). Timbrespecific enhancement of auditory cortical representations in musicians. Neuroreport, 12:169–174.
Papousek, M. (1996)。直觉养育:婴儿期音乐刺激的隐藏来源。载于:I. Deliége 和 J. Sloboda(编辑),音乐起点:音乐能力的起源和发展(第 88-112 页)。纽约:牛津大学出版社。
Papousek, M. (1996). Intuitive parenting: A hidden source of musical stimulation in infancy. In: I. Deliége & J. Sloboda (Eds.), Musical Beginnings: Origins and Development of Musical Competence (pp. 88–112). New York: Oxford University Press.
Papousek, M.、Bornstein, MH、Nuzzo, C.、Papousek, H. 和 Symmes, D. (1990)。婴儿对父母言语中典型旋律轮廓的反应。婴儿行为与发展, 13:539–545。
Papousek, M., Bornstein, M. H., Nuzzo, C., Papousek, H., & Symmes, D. (1990). Infant responses to prototypical melodic contours in parental speech. Infant Behavior and Development, 13:539–545.
Park, T., & Balaban, E. (1991)。新生鸡科鸟类母性叫声的相对显着性:日本鹌鹑 ( Coturnix coturnix japonica)和家鸡 ( Gallus gallus domesticus) 的直接比较。比较心理学杂志, 105:45-54。
Park, T., & Balaban, E. (1991). Relative salience of species maternal calls in neonatal gallinaceous birds: A direct comparison of Japanese quail (Coturnix coturnix japonica) and domestic chickens (Gallus gallus domesticus). Journal of Comparative Psychology, 105:45–54.
Parncutt, R. (1989)。和谐:一种心理声学方法。柏林:施普林格出版社。
Parncutt, R. (1989). Harmony: A Psychoacoustical Approach. Berlin: Springer-Verlag.
Parncutt, R. (1994)。音乐节奏中脉冲显着性和韵律重音的感知模型。音乐感知, 11:409–464。
Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms. Music Perception, 11:409–464.
Parncutt, R., & Bregman, A. (2000)。短和弦进行后的音调配置文件:自上而下还是自下而上?音乐感知, 18:25–58。
Parncutt, R., & Bregman, A. (2000). Tone profiles following short chord progressions: Top-down or bottom up? Music Perception, 18:25–58.
Partch, H. (1974)。音乐的起源。纽约:达卡波出版社。
Partch, H. (1974). Genesis of a Music. New York: Da Capo Press.
Partch, H. (1991)。苦涩的音乐。(T. McGeary,编)。厄巴纳:伊利诺伊大学出版社。
Partch, H. (1991). Bitter Music. (T. McGeary, Ed.). Urbana: University of Illinois Press.
Partee, B. (1995)。词汇语义和组合性。在:D. Osherson(Gen. Ed.),认知科学的邀请。第一部分:语言(第 2 版,第 311–360 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Partee, B. (1995). Lexical semantics and compositionality. In: D. Osherson (Gen. Ed.), An Invitation to Cognitive Science. Part I: Language (2nd ed, pp. 311–360). Cambridge, MA: MIT Press.
Pascual-Leone, A. (2003)。制作音乐并被音乐改变的大脑。在:I. Peretz & R. Zatorre(编辑),音乐的认知神经科学(第 396-409 页)。纽约:牛津大学出版社。
Pascual-Leone, A. (2003). The brain that makes music and is changed by it. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music (pp. 396–409). New York: Oxford University Press.
Pastore, RE、Schmuckler, MA、Rosenblum, L. 和 Szczesiul, R. (1983)。音乐刺激的双重感知。感知与心理物理学, 33:469-474。
Pastore, R. E., Schmuckler, M. A., Rosenblum, L., & Szczesiul, R. (1983). Duplex perception with musical stimuli. Perception and Psychophysics, 33:469–474.
帕特尔,公元 (2003a)。一种新的旋律认知神经科学方法。在:I. Peretz & R. Zatorre(编辑),音乐的认知神经科学(第 325-345 页)。英国牛津:牛津大学出版社。
Patel, A. D. (2003a). A new approach to the cognitive neuroscience of melody. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neursocience of Music (pp. 325–345). Oxford, UK: Oxford University Press.
帕特尔,公元 (2003b)。语言、音乐、语法和大脑。自然神经科学, 6:674–681。
Patel, A. D. (2003b). Language, music, syntax, and the brain. Nature Neuroscience, 6:674–681.
帕特尔,公元 (2005)。音乐与言语旋律和失语症句法处理障碍的关系。纽约科学院年鉴, 1060:59-70。
Patel, A. D. (2005). The relationship of music to the melody of speech and to syntactic processing disorders in aphasia. Annals of the New York Academy of Sciences, 1060:59–70.
帕特尔,公元 (2006a)。一种比较口语和音乐旋律中音高模式的经验方法:对 JGS Pearl 的“与大师的窃听:Leoš Janáček 和演讲音乐”的评论。实证音乐学评论, 1:166–169。
Patel, A. D. (2006a). An empirical method for comparing pitch patterns in spoken and musical melodies: A comment on J. G. S. Pearl’s “Eavesdropping with a master: Leoš Janáček and the music of speech.” Empirical Musicology Review, 1:166–169.
帕特尔,公元 (2006b)。音乐与心灵。UCSD Gray Matters 系列讲座。可在以下网址查看:http ://www.ucsd.tv/search-details.asp?showID=11189或通过 YouTube。
Patel, A. D. (2006b). Music and the Mind. Lecture given in the UCSD Grey Matters series. Available to view at: http://www.ucsd.tv/search-details.asp?showID=11189 or via YouTube.
帕特尔,公元 (2006c)。音乐节奏、语言节奏和人类进化。音乐感知, 24:99–104。
Patel, A. D. (2006c). Musical rhythm, linguistic rhythm, and human evolution. Music Perception, 24:99–104.
Patel, AD, & Balaban, E. (2000)。人类皮层活动的时间模式反映了音调序列结构。自然, 404:80-84。
Patel, A. D., & Balaban, E. (2000). Temporal patterns of human cortical activity reflect tone sequence structure. Nature, 404:80–84.
Patel, AD, & Balaban, E. (2001)。人类音调感知反映在与刺激相关的皮层活动的时间上。自然神经科学, 4:839–844。
Patel, A. D., & Balaban, E. (2001). Human pitch perception is reflected in the timing of stimulus-related cortical activity. Nature Neuroscience, 4:839–844.
Patel, AD, & Daniele, JR (2003a)。语言和音乐节奏的实证比较。认知, 87:B35-B45。
Patel, A. D., & Daniele, J. R. (2003a). An empirical comparison of rhythm in language and music. Cognition, 87:B35-B45.
Patel, AD 和 Daniele, JR (2003b)。重音计时与音节计时音乐?对 Huron 和 Ollen (2003) 的评论。音乐感知, 21:273–276。
Patel, A. D., & Daniele, J. R. (2003b). Stress-timed vs. syllable-timed music? A comment on Huron and Ollen (2003). Music Perception, 21:273–276.
Patel, AD、Foxton, JM 和 Griffiths, TD (2005)。音乐失聪的人难以辨别从语音中提取的语调轮廓。大脑与认知, 59:310–313。
Patel, A. D., Foxton, J. M., & Griffiths, T. D. (2005). Musically tone-deaf individuals have difficulty discriminating intonation contours extracted from speech. Brain and Cognition, 59:310–313.
Patel, AD、Gibson, E.、Ratner, J.、Besson, M. 和 Holcomb, P. (1998)。处理语言和音乐中的句法关系:一项与事件相关的潜在研究。认知神经科学杂志, 10:717-733。
Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. (1998). Processing syntactic relations in language and music: An event-related potential study. Journal of Cognitive Neuroscience, 10:717–733.
Patel, AD, & Iversen, JR (2003)。北印度塔布拉传统中语音和鼓声的声学和感知比较:声音象征主义的实证研究。第 15 届国际语音科学大会论文集,巴塞罗那,第 925-928 页。
Patel, A. D., & Iversen, J. R. (2003). Acoustic and perceptual comparison of speech and drum sounds in the North Indian tabla tradition: An empirical study of sound symbolism. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 925–928.
Patel, AD, & Iversen, JR (2006)。非人类动物可以在乐器上敲出稳定的节拍。载于:M. Baroni、AR Addessi、R. Caterina、M. Costa,第 9 届国际音乐感知与认知会议论文集 (ICMPC9),博洛尼亚/意大利,p. 477.
Patel, A. D., & Iversen, J. R. (2006). A non-human animal can drum a steady beat on a musical instrument. In: M. Baroni, A. R. Addessi, R. Caterina, M. Costa, Proceedings of the 9th International Conference on Music Perception and Cognition (ICMPC9), Bologna/Italy, p. 477.
Patel, AD, & Iversen, JR (2007)。音乐能力的语言优势。认知科学趋势, 11:369-372。
Patel, A. D., & Iversen, J. R. (2007). The linguistic benefits of musical abilities. Trends in Cognitive Sciences, 11:369–372.
Patel, AD、Iversen, JR、Chen, YC 和 Repp, BR (2005)。韵律和模态对节拍同步的影响。实验脑研究, 163:226–238。
Patel, A. D., Iversen, J. R., Chen, Y. C., & Repp, B. R. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research, 163:226–238.
Patel, AD、Iversen, JR 和 Rosenberg, JC (2006)。比较语言和音乐的节奏和旋律:以英国英语和法语为例。美国声学学会杂志, 119:3034–3047。
Patel, A. D., Iversen, J. R., & Rosenberg, J. C. (2006). Comparing the rhythm and melody of speech and music: The case of British English and French. Journal of the Acoustical Society of America, 119:3034–3047.
Patel, AD、Iversen, JR、Wassenaar, M. 和 Hagoort, P. (2008)。语法性布罗卡失语症中的音乐句法处理。失语症, 22:776-780。
Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology, 22:776–780.
Patel, AD、Löfqvist, A. 和 Naito, W. (1999)。定时语音的声学和运动学:研究 P 中心问题的数据库和方法。第 14 届国际语音科学会议论文集,1999 年,旧金山, 1:405–408。
Patel, A. D., Löfqvist, A., & Naito, W. (1999). The acoustics and kinematics of regularly-timed speech: A database and method for the study of the P-center problem. Proceedings of the 14th International Conference of Phonetic Sciences, 1999, San Francisco, 1:405–408.
Patel, AD、Peretz, I.、Tramo, M. 和 Labrecque, R. (1998)。处理韵律和音乐模式:一项神经心理学调查。大脑和语言, 61:123–144。
Patel, A. D., Peretz, I., Tramo, M., & Labrecque, R. (1998). Processing prosodic and musical patterns: A neuropsychological investigation. Brain and Language, 61:123–144.
Patel, AD、Wong, M.、Foxton, J.、Lochy, A. 和 Peretz, I.(出版中)。音乐音调性耳聋的语言语调感知缺陷。音乐感知。
Patel, A. D., Wong, M., Foxton, J., Lochy, A., & Peretz, I. (in press). Linguistic intonation perception deficits in musical tone deafness. Music Perception.
佩恩,K. (2000)。座头鲸逐渐变化的歌曲:野生动物创作过程的窗口。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 135-150 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Payne, K. (2000). The progressively changing songs of humpback whales: A window on the creative process in a wild animal. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 135–150). Cambridge, MA: MIT Press.
J. 珍珠 (2006)。与大师一起窃听:Leoš Janáčcek 和演讲的音乐。实证音乐学评论, 1:131–165。
Pearl, J. (2006). Eavesdropping with a master: Leoš Janáčcek and the music of speech. Empirical Musicology Review, 1:131–165.
Penman, J. 和 Becker, J.(已提交)。宗教狂喜、“深沉的听众”和音乐情感。
Penman, J., & Becker, J. (submitted). Religious ecstatics, “deep listeners,” and musical emotion.
IM 佩珀伯格 (2002)。亚历克斯研究:灰鹦鹉的认知和交流能力。马萨诸塞州剑桥市:哈佛大学出版社。
Pepperberg, I. M. (2002). The Alex Studies: Cognitive and Communicative Abilities of Grey Parrots. Cambridge, MA: Harvard University Press.
Peretz, I. (1989)。音乐中的聚类:任务因素的评估。国际心理学杂志, 24:157-178。
Peretz, I. (1989). Clustering in music: An appraisal of task factors. International Journal of Psychology, 24:157–178.
Peretz, I. (1990)。单侧脑损伤患者对局部和全局音乐信息的处理。大脑, 113:1185–1205。
Peretz, I. (1990). Processing of local and global musical information by unilateral braindamaged patients. Brain, 113:1185–1205.
Peretz, I. (1993)。旋律的听觉无调性。认知神经心理学, 10:21–56。
Peretz, I. (1993). Auditory atonalia for melodies. Cognitive Neuropsychology, 10:21–56.
Peretz, I. (1996)。我们会因为音乐而失去记忆吗?一个非音乐家的音乐失认症案例。认知神经科学杂志, 8:481–496。
Peretz, I. (1996). Can we lose memory for music? A case of music agnosia in a nonmusician. Journal of Cognitive Neuroscience, 8:481–496.
Peretz, I. (2001)。聆听大脑:音乐情感的生物学视角。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 105-134 页)。英国牛津:牛津大学出版社。
Peretz, I. (2001). Listen to the brain: A biological perspective on musical emotions. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 105–134). Oxford, UK: Oxford University Press.
Peretz, I. (2006)。从生物学角度看音乐的本质。认知, 100:1-32。
Peretz, I. (2006). The nature of music from a biological perspective. Cognition, 100:1–32.
Peretz, I.、Ayotte, J.、Zatorre, R.、Mehler, J.、Ahad, P.、Penhune, V. 和 Jutras, B. (2002)。先天性失乐症:一种细粒度音高辨别障碍。神经元, 33:185-191。
Peretz, I., Ayotte, J., Zatorre, R., Mehler, J., Ahad, P., Penhune, V., & Jutras, B. (2002). Congenital amusia: A disorder of fine-grained pitch discrimination. Neuron, 33:185–191.
Peretz, I.、Brattico, E. 和 Tervaniemi, M. (2005)。先天性失乐症中对音调的异常脑电反应。神经病学年鉴, 58:478–482。
Peretz, I., Brattico, E., & Tervaniemi, M. (2005). Abnormal electrical brain responses to pitch in congenital amusia. Annals of Neurology, 58:478–482.
Peretz, I., & Coltheart, M. (2003)。音乐处理的模块化。自然神经科学, 6:688–691。
Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, 6:688–691.
Peretz, I.、Gagnon, L. 和 Bouchard, B. (1998)。音乐和情感:脑损伤后的知觉决定因素、即时性和隔离。认知, 68:111-141。
Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants, immediacy, and isolation after brain damage. Cognition, 68:111–141.
Peretz, I.、Gagnon, L.、Hébert, S. 和 Macoir, J. (2004)。在大脑中歌唱:认知神经心理学的见解。音乐感知, 21:373–390。
Peretz, I., Gagnon, L., Hébert, S., & Macoir, J. (2004). Singing in the brain: Insights from cognitive neuropsychology. Music Perception, 21:373–390.
Peretz, I.、Gaudreau, D. 和 Bonnel, A.-M。(1998)。曝光对音乐偏好和识别的影响。记忆与认知, 26:884–902。
Peretz, I., Gaudreau, D., & Bonnel, A.-M. (1998). Exposure effects on music preference and recognition. Memory and Cognition, 26:884–902.
Peretz, I., & Hyde, KL (2003)。音乐处理有什么特点?来自先天性失乐症的见解。认知科学趋势, 7:362–367。
Peretz, I., & Hyde, K. L. (2003). What is specific to music processing? Insights from congenital amusia. Trends in Cognitive Sciences, 7:362–367.
Peretz, I.、Kolinsky, R.、Tramo, M.、Labrecque, R.、Hublet, C.、Demeurisse, G. 和 Belleville, S. (1994)。双侧听觉皮层损伤后的功能分离。大脑, 117:1283–1302。
Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville, S. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain, 117:1283–1302.
帕尔曼,M. (1997)。即兴合奏音乐中表演者互动的民族音乐学:对最近两项研究的回顾。音乐感知, 15:99–118。
Perlman, M. (1997). The ethnomusicology of performer interaction in improvised ensemble music: A review of two recent studies. Music Perception, 15:99–118.
Perlman, M., & Krumhansl, CL (1996)。爪哇和西方音乐家内部音程标准的实验研究。音乐感知, 14:95–116。
Perlman, M., & Krumhansl, C. L. (1996). An experimental study of internal interval standards in Javanese and Western musicians. Music Perception, 14:95–116.
Pesetsky, D. (2007)。音乐语法是语言语法。论文发表于“语言和音乐作为认知系统”,5 月 11 日至 13 日,英国剑桥大学。
Pesetsky, D. (2007). Music syntax is language syntax. Paper presented at “Language and Music as Cognitive Systems,” May 11–13, Cambridge University, UK.
GE 彼得森和 HL 巴尼 (1952)。元音研究中使用的控制方法。美国声学学会杂志, 24:175-184。Peterson, GE, & Lehiste, I. (1960)。英语音节核的持续时间。美国声学学会杂志, 32:693-703。
Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24:175–184. Peterson, G. E., & Lehiste, I. (1960). Duration of syllable nuclei in English. Journal of the Acoustical Society of America, 32:693–703.
Petitto, LA、Holowka, S.、Sergio, LE、Levy, B. 和 Ostry, DJ (2004)。随着语言节奏移动的婴儿手:听到正在学习手语的婴儿在手上默默地喋喋不休。认知, 93:43-73。
Petitto, L. A., Holowka, S., Sergio, L. E., Levy, B., & Ostry, D. J. (2004). Baby hands that move to the rhythm of language: Hearing babies acquiring sign language babble silently on the hands. Cognition, 93:43–73.
Petitto, LA、Holowka, S.、Sergio, L. 和 Ostry, D. (2001)。婴儿手部动作中的语言节奏。自然, 413:35-36。
Petitto, L. A., Holowka, S., Sergio, L., & Ostry, D. (2001). Language rhythms in babies’ hand movements. Nature, 413:35–36.
Petitto, LA, & Marentette, P. (1991)。以手动模式牙牙学语:语言个体发育的证据。科学, 251:1483–1496。
Petitto, L. A., & Marentette, P. (1991). Babbling in the manual mode: Evidence for the ontogeny of language. Science, 251:1483–1496.
Phillips, C.、Pellathy, T.、Marantz, A.、Yellin, E.、Wexler, K.、Poeppel, D.、McGinnis, M. 和 Roberts, TPL (2000)。听觉皮层访问语音类别:MEG 不匹配研究。认知神经科学杂志, 12:1038–1055。
Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., McGinnis, M.,& Roberts, T. P. L. (2000). Auditory cortex accesses phonological categories: An MEG mismatch study. Journal of Cognitive Neuroscience, 12:1038–1055.
Phillips-Silver, J., & Trainor, LJ (2005)。感受音乐中的节拍:运动会影响婴儿的节奏感知。科学, 308:1430。
Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat in music: Movement influences rhythm perception in infants. Science, 308:1430.
皮埃尔亨伯特,J.(1979 年)。基频偏角的感知。美国声学学会杂志, 66:363–369。
Pierrehumbert, J. (1979). The perception of fundamental frequency declination. Journal of the Acoustical Society of America, 66:363–369.
皮埃尔亨伯特,J.(1980 年)。英语语调的语音学和音系学。博士 论文,麻省理工学院。[印第安纳大学语言学俱乐部转载,1987]。
Pierrehumbert, J. (1980). The Phonetics and Phonology of English Intonation. Ph.D. dissertation, Massachusetts Institute of Technology. [Reprinted by Indiana University Linguistics Club, 1987].
皮埃尔亨伯特,J.(2000 年)。色调元素及其对齐方式。载于:M. Horne(主编),韵律:理论与实验(第 11-36 页)。荷兰多德雷赫特:Kluwer。
Pierrehumbert, J. (2000). Tonal elements and their alignment. In: M. Horne (Ed.), Prosody: Theory and Experiment (pp. 11–36). Dordrecht, The Netherlands: Kluwer.
Pierrehumbert, J., & Hirschberg, J. (1990)。语篇解释中语调的意义。载于:P. Cohen、J. Morgan 和 M. Pollack,(编),《沟通的意图》(第 271-311 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In: P. Cohen, J. Morgan, & M. Pollack, (Eds.), Intentions in Communication (pp. 271–311). Cambridge, MA: MIT Press.
Pierrehumbert, J., & Steele, J. (1989)。英语声调对齐的类别。语音学, 46:181-196。
Pierrehumbert, J., & Steele, J. (1989). Categories of tonal alignment in English. Phonetica, 46:181–196.
堪萨斯州派克 (1945)。美国英语的语调。安娜堡:密歇根大学出版社。
Pike, K. N. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.
Pilon, R. (1981)。外语语音分割。心理语言学研究杂志, 10:113-121。
Pilon, R. (1981). Segmentation of speech in a foreign language. Journal of Psycholinguistic Research, 10:113–121.
S. 平克 (1997)。思维是如何运作的。伦敦:艾伦巷。
Pinker, S. (1997). How the Mind Works. London: Allen Lane.
Pinker, S., & Jackendoff, R. (2005)。语文学院:它有什么特别之处?认知, 95:201-236。
Pinker, S., & Jackendoff, R. (2005). The faculty of language: What’s special about it? Cognition, 95:201–236.
Piske, T.、MacKay, I. 和 Flege, J. (2001)。影响 L2 外国口音程度的因素:回顾。语音学杂志, 29:191-215。
Piske, T., MacKay, I., & Flege, J. (2001). Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics, 29:191–215.
皮索尼,DB (1977)。两个分量音的相对起始时间的识别和辨别:对停止中发声感知的影响。美国声学学会杂志, 61:1352–1361。
Pisoni, D. B. (1977). Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops. Journal of the Acoustic Society of America, 61: 1352–1361.
皮索尼,DB (1997)。关于言语感知“规范化”的一些思考。载于:K. Johnson 和 JW Mullennix(编辑),讲话者在语音处理中的变异性(第 9-32 页)。加利福尼亚州圣地亚哥:学术出版社。
Pisoni, D. B. (1997). Some thoughts on “normalization” in speech perception. In: K. Johnson & J. W. Mullennix (Eds.), Talker Variability in Speech Processing (pp. 9–32). San Diego, CA: Academic Press.
活塞,W. (1987)。Harmony(第 5 版,由 M. DeVoto 修订和扩充)。纽约:诺顿。
Piston, W. (1987). Harmony (5th ed., revised and expanded by M. DeVoto). New York: Norton.
马萨诸塞州皮特和股份公司塞缪尔 (1990)。节奏在听演讲中的运用。实验心理学杂志:人类感知和表现, 16:564–573。
Pitt, M. A., & Samuel, A. G. (1990). The use of rhythm in attending to speech. Journal of Experimental Psychology: Human Perception and Performance, 16:564–573.
Plack, CJ, Oxenham, AJ, Fay, RR, & Popper, AN(编辑)。(2005)。间距:神经编码和感知。德国柏林:施普林格。
Plack, C. J., Oxenham, A. J., Fay, R. R., & Popper, A. N. (Eds.). (2005). Pitch: Neural Coding and Perception. Berlin, Germany: Springer.
Plantinga, J., & Trainor, LJ (2005)。旋律记忆:婴儿使用相对音调代码。认知, 98:1-11。
Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a relative pitch code. Cognition, 98:1–11.
Plomp, R., & Levelt, WJM (1965)。音调协和和临界带宽。美国声学学会杂志三十八:548—560。
Plomp, R., & Levelt, W. J. M. (1965). Tonal consonance and critical bandwith. Journal of the Acoustical Society of America 38:548–560.
Poeppel, D. (2001)。纯词性耳聋和语音代码的双边处理。认知科学, 25:679–693。
Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code. Cognitive Science, 25:679–693.
Poeppel, D. (2003)。不同时间整合窗口中的语音分析:作为“时间不对称采样”的大脑偏侧化。语音通信, 41:245–255。
Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymmetric sampling in time.” Speech Communication, 41:245–255.
Polka, L.、Colantonio, C. 和 Sundara, M. (2001)。/d/~/D/ 歧视的跨语言比较:新发展模式的证据。美国声学学会杂志, 109:2190–2201。
Polka, L., Colantonio, C., & Sundara, M. (2001). A cross-language comparison of /d/~/D/ discrimination: Evidence for a new developmental pattern. Journal of the Acoustical Society of America, 109:2190–2201.
Poulin-Charronnat, B.、Bigand, E. 和 Koelsch, S. (2006)。处理音乐语法补品与次主导:一项与事件相关的潜在研究。认知神经科学杂志, 18:1545-1554。
Poulin-Charronnat, B., Bigand, E., & Koelsch, S. (2006). Processing of musical syntax tonic versus subdominant: An event-related potential study. Journal of Cognitive Neuroscience, 18:1545–1554.
Poulin-Charronnat, B.、Bigand, E.、Madurell, F. 和 Peereman, R. (2005)。音乐结构调节声乐中的语义启动。认知, 94:B67-78。
Poulin-Charronnat, B., Bigand, E., Madurell, F., & Peereman, R. (2005). Musical structure modulates semantic priming in vocal music. Cognition, 94:B67–78.
Pouthas, V. (1996)。婴儿和儿童时间感知和行为时间调节的发展。载于:I. Deliege & J. Sloboda(编),Musical Beginnings(第 115-141 页)。英国牛津:牛津大学出版社。
Pouthas, V. (1996). The development of the perception of time and temporal regulation of action in infants and children. In: I. Deliege & J. Sloboda (Eds.), Musical Beginnings (pp. 115–141). Oxford, UK: Oxford University Press.
Povel, DJ, & Essens, P. (1985)。时间模式的感知。音乐感知, 2:411–440。
Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2:411–440.
Povel, DJ, & Jansen, E. (2001)。音乐处理中的感知机制。音乐感知, 19:169–197。
Povel, D. J., & Jansen, E. (2001). Perceptual mechanisms in music processing. Music Perception, 19:169–197.
Povel, DJ, & Okkerman, H. (1981)。equitone 序列中的口音。感知与心理物理学, 30:565-572。
Povel, D. J., & Okkerman, H. (1981). Accents in equitone sequences. Perception and Psychophysics, 30:565–572.
Powers, HS (1980)。语言模型和音乐分析。民族音乐学, 24:1-61。
Powers, H. S. (1980). Language models and music analysis. Ethnomusicology, 24:1–61.
Pressing, J. (2002)。黑色大西洋节奏:其计算和跨文化基础。音乐感知, 3:285–310。
Pressing, J. (2002). Black Atlantic rhythm: Its computational and transcultural foundations. Music Perception, 3:285–310.
Price, CJ, & Friston, KJ (1997)。认知结合:大脑激活实验的新方法。神经影像学, 5:261–270。
Price, C. J., & Friston, K. J. (1997). Cognitive conjunction: A new approach to brain activation experiments. NeuroImage, 5:261–270.
Price, P.、Ostendorf, M.、Shattuck-Hufnagel, S. 和 Fong, C. (1991)。韵律在句法消歧中的应用。美国声学学会杂志, 90:2956–2970。
Price, P., Ostendorf, M., Shattuck-Hufnagel, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90:2956–2970.
Profita, J., & Bidder, TG (1988)。完美的音调。美国医学遗传学杂志, 29:763–71。
Profita, J., & Bidder, T. G. (1988). Perfect pitch. American Journal of Medical Genetics, 29:763–71.
Provine, RR (1992)。具有感染力的笑声:笑声足以刺激大笑和微笑。心理经济学会公报, 30:1-4。
Provine, R. R. (1992). Contagious laughter: Laughter is a sufficient stimulus for laughs and smiles. Bulletin of the Psychonomic Society, 30:1–4.
Pullum, G. (1991)。伟大的爱斯基摩词汇骗局,以及其他关于语言研究的不敬论文。芝加哥:芝加哥大学出版社。
Pullum, G. (1991). The Great Eskimo Vocabulary Hoax, and Other Irreverent Essays on the Study of Language. Chicago: University of Chicago Press.
派尔斯 (2006)。构建社会思想:语言和错误信念的理解。载于:N. Enfield & SC Levinson(编),人类社会性的根源:文化、认知和互动。牛津:伯格。
Pyers, J. (2006). Constructing the social mind: Language and false-belief understanding. In: N. Enfield & S. C. Levinson (Eds.), Roots of Human Sociality: Culture, Cognition, and Interaction. Oxford: Berg.
Racette, A.、Bard, C. 和 Peretz, I. (2006)。让不流利的失语症患者说话:跟着唱!大脑, 129:2571–2584。
Racette, A., Bard, C., & Peretz, I. (2006). Making non-fluent aphasics speak: Sing along! Brain, 129:2571–2584.
Raffman, D. (1993)。语言、音乐和思想。马萨诸塞州剑桥市:麻省理工学院出版社。
Raffman, D. (1993). Language, Music, and Mind. Cambridge, MA: MIT Press.
Rakowski, A. (1990)。孤立和音乐背景下音程的语调变体。音乐心理学, 18:60–72。
Rakowski, A. (1990). Intonation variants of musical intervals in isolation and in musical contexts. Psychology of Music, 18:60–72.
拉莫,J.-P。(1722/1971)。和谐论(P. Gossett, Trans.)。纽约:多佛。
Rameau, J.-P. (1722/1971). Treatise on Harmony (P. Gossett, Trans.). New York: Dover.
Ramus, F. (2002a)。语言节奏的声学关联:观点。载于:B. Bell & I. Marlien(编),Proceedings of Speech Prosody,普罗旺斯地区艾克斯(第 115-120 页)。法国普罗旺斯地区艾克斯:Parole et Langage。
Ramus, F. (2002a). Acoustic correlates of linguistic rhythm: Perspectives. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence (pp. 115–120). Aix-en-Provence, France: Laboratoire Parole et Langage.
Ramus, F. (2002b)。新生儿的语言歧视:区分语音、节奏和语调提示。语言习得年度回顾, 2:85–115。
Ramus, F. (2002b). Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues. Annual Review of Language Acquisition, 2:85–115.
Ramus, F. (2003)。阅读障碍:特定的语音缺陷或一般感觉运动功能障碍?神经生物学的最新观点, 13:212-218。
Ramus, F. (2003). Dyslexia: Specific phonological deficit or general sensorimotor dysfunction? Current Opinion in Neurobiology, 13:212–218.
Ramus, F. (2006)。阅读障碍和其他具有相关感觉运动综合征的特定领域发育障碍的神经学模型。在:GD罗森(编辑),阅读困难的大脑:神经科学发现的新途径(第 75-101 页)。新泽西州 Mahwah:Lawrence Erlbaum。
Ramus, F. (2006). A neurological model of dyslexia and other domain-specific developmental disorders with an associated sensorimotor syndrome. In: G. D. Rosen (Ed.), The Dsylexic Brain: New Pathways in Neuroscience Discovery (pp. 75–101). Mahwah, NJ: Lawrence Erlbaum.
Ramus, F.、Dupoux, E. 和 Mehler, J. (2003)。节奏课的心理现实:知觉研究。第 15 届国际语音科学大会论文集,巴塞罗那,第 337–342 页。
Ramus, F., Dupoux, E., & Mehler, J. (2003). The psychological reality of rhythm classes: Perceptual studies. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 337–342.
Ramus, F.、Hauser, MD、Miller, C.、Morris, D. 和 Mehler, J. (2000)。人类新生儿和棉顶狨猴的语言歧视。科学, 288, 349–351。
Ramus, F., Hauser, M. D., Miller, C., Morris, D., & Mehler, J. (2000). Language discrimination by human newborns and by cotton-top tamarin monkeys. Science, 288, 349–351.
Ramus, F., & Mehler, J. (1999)。具有超音段线索的语言识别:基于语音再合成的研究。美国声学学会杂志, 105:512–521。
Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis. Journal of the Acoustical Society of America, 105:512–521.
Ramus, F.、Nespor, M. 和 Mehler, J. (1999)。语音信号中语言节奏的相关性。认知, 73:265-292。
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73:265–292.
兰德尔,DM (1978)。哈佛简明音乐词典。马萨诸塞州剑桥市:哈佛大学出版社。
Randel, D. M. (1978). Harvard Concise Dictionary of Music. Cambridge, MA: Harvard University Press.
Rantala, V.、Rowell, L. 和 Tarasti, E.(编辑)。(1988)。音乐哲学论文集,芬尼卡哲学学报(第 43 卷)。赫尔辛基:芬兰哲学学会。
Rantala, V., Rowell, L., & Tarasti, E. (Eds.). (1988). Essays on the Philosophy of Music, Acta Philosophica Fennica (Vol. 43). Helsinki: The Philosophical Society of Finland.
L. 拉特纳 (1980)。古典音乐:表达、形式和风格。纽约:Schirmer Books。
Ratner, L. (1980). Classic Music: Expression, Form, and Style. New York: Schirmer Books.
Rauschecker, JP (1998a)。复杂声音的皮质处理。神经生物学的当前观点, 8:516–521。
Rauschecker, J. P. (1998a). Cortical processing of complex sounds. Current Opinion in Neurobioliology, 8:516–521.
Rauschecker, JP (1998b)。灵长类动物听觉皮层的并行处理。听力学和神经耳科学, 3:86-103。
Rauschecker, J. P. (1998b). Parallel processing in the auditory cortex of primates. Audiology and Neuro-Otology, 3:86- 103.
Reck, D. (1997)。整个地球的音乐。纽约:达卡波出版社。
Reck, D. (1997). Music of the Whole Earth. New York: Da Capo Press.
Redi, L. (2003)。英语音高轮廓产生中的分类效应。第 15 届国际语音科学大会论文集,巴塞罗那,第 2921–2924 页。
Redi, L. (2003). Categorical effects in production of pitch contours in English. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 2921–2924.
Regnault, P.、Bigand, E. 和 Besson, M. (2001)。不同的大脑机制调解对感觉和谐和谐波环境的敏感性:来自与听觉事件相关的大脑电位的证据。认知神经科学杂志, 13:241-255。
Regnault, P., Bigand, E., & Besson, M. (2001). Different brain mechanisms mediate sensitivity to sensory consonance and harmonic context: Evidence from auditory event related brain potentials. Journal of Cognitive Neuroscience, 13:241–255.
Remez, RE, Rubin, PE, Berns, SM, Pardo, JS, & Lang, JM (1994)。关于言语的感性组织。心理评论, 101:129-156。
Remez, R. E., Rubin, P. E., Berns, S. M., Pardo, J. S., & Lang, J. M. (1994). On the perceptual organization of speech. Psychological Review, 101:129–156.
Remez, RE、Rubin, PE、Pisoni, DB 和 Carrell, TD (1981)。没有传统语音提示的语音感知。科学, 212:947-950。
Remez, R. E., Rubin, P. E., Pisoni, D. B., & Carrell, T. D. (1981). Speech perception without traditional speech cues. Science, 212:947–950.
Repp, BH (1984)。分类感知:问题、方法、发现。NJ Lass(主编),演讲和语言:基础研究和实践的进展(第 10 卷,第 243-335 页)。纽约:学术出版社。
Repp, B. H. (1984). Categorical perception: Issues, methods, findings. In N. J. Lass (Ed.), Speech and language: Advances in basic research and practice (Vol. 10, pp. 243–335). New York: Academic Press.
Repp, BH (1992a)。探索音乐时间的认知表征:对时间扰动感知的结构限制。认知, 44:241-281。
Repp, B. H. (1992a). Probing the cognitive representation of musical time: Structural constraints on the perception of timing perturbations. Cognition, 44:241–281.
Repp, BH (1992b)。音乐表演中的多样性和共性:舒曼“Träumerei”中时间微观结构的分析。美国声学学会杂志, 92:2546–2568。
Repp, B. H. (1992b). Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s “Träumerei.” Journal of the Acoustical Society of America, 92:2546–2568.
Repp, BH, & Williams, DR (1987)。模仿自产孤立元音的分类倾向。语音通信, 6:1-14。
Repp, B. H., & Williams, D. R. (1987). Categorical tendencies in imitating self-produced isolated vowels. Speech Communication, 6:1–14.
Rialland, A. (2003)。Silbo Gomero 的新视角。第 15 届国际语音科学大会论文集,巴塞罗那,第 2131–2134 页。
Rialland, A. (2003). A new perspective on Silbo Gomero. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 2131–2134.
Rialland, A. (2005)。口哨语言的语音和语音方面。音系学, 22:237–271。
Rialland, A. (2005). Phonological and phonetic aspects of whistled languages. Phonology, 22:237–271.
理查德·P.(1972 年)。豪萨歌曲中语言声调与旋律关系的定量分析。非洲语言研究, 13:137–161。
Richard, P. (1972). A quantitative analysis of the relationship between language tone and melody in a Hausa song. African Language Studies, 13:137–161.
爱荷华州理查兹 (1979)。节奏和节拍。收录于:H. Gross(主编),诗歌结构(第 2 版,第 68-76 页)。纽约:ECCO 出版社。
Richards, I. A. (1979). Rhythm and metre. In: H. Gross (Ed.), The Structure of Verse (2nd ed., pp. 68–76). New York: Ecco Press.
Richter, D.、Waiblinger, J.、Rink, WJ 和 Wagner, GA (2000)。德国南部 Geissenklosterle 洞穴中晚期和早期上旧石器时代遗址的热释光、电子自旋共振和 C-14 测年。考古科学杂志, 27:71-89。
Richter, D., Waiblinger, J., Rink, W. J., & Wagner, G. A. (2000). Thermoluminescence, electron spin resonance and C-14-dating of the late middle and early upper palaeolithic site of Geissenklosterle Cave in southern Germany. Journal of Archaeological Science, 27:71–89.
Rimland, B., & Fein, D. (1988)。自闭症学者的特殊才能。载于:LK Obler & D. Fein(编辑),非凡的大脑:天赋和特殊能力的神经心理学(第 474-492 页)。纽约:吉尔福德。
Rimland, B., & Fein, D. (1988). Special talents of autistic savants. In: L. K. Obler & D. Fein (Eds.), The Exceptional Brain: The Neuropsychology of Talent and Special Abilities (pp. 474–492). New York: Guilford.
亚拉巴马州林格 (2001)。旋律。载于:S. Sadie(主编),The New Grove Dictionary of Music and Musicians(第 16 卷,第 363-373 页)。纽约:格罗夫。
Ringer, A. L. (2001). Melody. In: S. Sadie (Ed.), The New Grove Dictionary of Music and Musicians (Vol. 16, pp. 363–373). New York: Grove.
里塞特,J.-C。(1991)。语音和音乐相结合:概述。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 368-379 页)。伦敦:麦克米伦。
Risset, J.-C. (1991). Speech and music combined: An overview. In: J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 368–379). London: Macmillan.
Risset, J.-C., & Wessel, D. (1999)。通过分析和合成探索音色。载于:D. Deutsch(主编),音乐心理学(第 2 版,第 113-169 页)。San Diego.CA:学术出版社。
Risset, J.-C., & Wessel, D. (1999). Exploration of timbre by analysis and synthesis. In: D. Deutsch (Ed.), The Psychology of Music (2nd ed., pp. 113–169). San Diego.CA: Academic Press.
Rivenez, M.、Gorea, A.、Pressnitzer, D. 和 Drake, C. (2003)。音乐、环境和人工声音序列的容差窗口。第七届国际音乐感知和认知会议论文集,悉尼,第 560-563 页。
Rivenez, M., Gorea, A., Pressnitzer, D., & Drake, C. (2003). The tolerance window for sequences of musical, environmental, and artificial sounds. Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney, pp. 560–563.
罗奇, P. (1982)。关于“重音计时”和“音节计时”语言之间的区别。载于:D. Crystal(主编),语言学争议:纪念 FR Palmer 的语言学理论与实践论文(第 73-79 页)。伦敦:爱德华阿诺德。
Roach, P. (1982). On the distinction between “stress-timed” and “syllable-timed” languages. In: D. Crystal (Ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer (pp. 73–79). London: Edward Arnold.
Rohrmeier, M. (2007)。全音阶谐波结构的生成方法。在:第四届声音和音乐计算会议论文集,希腊莱夫卡达,第 97-100 页。
Rohrmeier, M. (2007). A generative approach to diatonic harmonic structure. In: Proceedings of the 4th Sound and Music Computing Conference, Lefkada, Greece, pp. 97–100.
Römer, H.、Hedwig, B. 和 Ott, SR (2002)。对侧抑制作为一种感觉偏差:雌性偏好同步鸣叫的灌木蟋蟀Mecopoda elongata 的神经基础。欧洲神经科学杂志, 15:1655–1662。
Römer, H., Hedwig, B., & Ott, S. R. (2002). Contralateral inhibition as a sensory bias: The neural basis for a female preference in a synchronously calling bushcricket, Mecopoda elongata. European Journal of Neuroscience, 15:1655–1662.
罗施,E. (1973)。自然类。认知心理学, 4:328–350。
Rosch, E. (1973). Natural categories. Cognitive Psychology, 4:328–350.
罗施,E. (1975)。认知参考点。认知心理学, 7:532–547。
Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7:532–547.
罗森, S. (2003)。阅读障碍和特定语言障碍的听觉处理:有缺陷吗?它的本质是什么?它能说明什么吗?语音学杂志, 31:509-527。
Rosen, S. (2003). Auditory processing in dyslexia and specific language impairment: Is there a deficit? What is its nature? Does it explain anything? Journal of Phonetics, 31:509–527.
Rosen, S., & Howell, P. (1991)。语音和听觉信号和系统。加利福尼亚州圣地亚哥:学术出版社。
Rosen, S., & Howell, P. (1991). Signals and Systems for Speech and Hearing. San Diego, CA: Academic Press.
DA 罗斯、IR 奥尔森和 JC 戈尔 (2003)。绝对音高不依赖于早期的音乐训练。纽约科学院年鉴, 999:522–526。
Ross, D. A., Olson, I. R., & Gore, J. C. (2003). Absolute pitch does not depend on early musical training. Annals of the New York Academy of Sciences, 999:522–526.
罗斯,E. (1981)。aprosodias:右半球语言情感成分的功能解剖组织。神经病学档案, 38:561-569。
Ross, E. (1981). The aprosodias: Functional-anatomic organization of the affective components of language in the right hemisphere. Archives of Neurology, 38:561–569.
罗斯,E. (2000)。情感韵律和韵律。在:M.-M。Mesulaum(编辑),行为和认知神经学原理,(第 316-331 页)。英国牛津:牛津大学出版社。
Ross, E. (2000). Affective prosody and the aprosodias. In: M.-M. Mesulaum (Ed.), Principles of Behavioral and Cognitive Neurology, (pp. 316–331). Oxford, UK: Oxford University Press.
罗斯,J. (1989)。爱沙尼亚符文歌曲的时间研究。美国声学学会杂志, 86:1671–1677。
Ross, J. (1989). A study of timing in an Estonian runic song. Journal of the Acoustical Society of America, 86:1671–1677.
Ross, J., & Lehiste, I. (2001)。爱沙尼亚符文歌曲的时间结构。德国柏林:Mouton de Gruyter。
Ross, J., & Lehiste, I. (2001). The Temporal Structure of Estonian Runic Songs. Berlin, Germany: Mouton de Gruyter.
Ross, JM, & Lehiste, I. (1998)。爱沙尼亚民歌中的时间是语音韵律、韵律和音乐节奏之间的相互作用。音乐感知, 15:319–333。
Ross, J. M., & Lehiste, I. (1998). Timing in Estonian folksongs as interaction between speech prosody, metre, and musical rhythm. Music Perception, 15:319–333.
罗西,M. (1971)。Le seuil de glissando 或 seuil de perception des variations tonales pour la parole。语音学, 23:1-33。
Rossi, M. (1971). Le seuil de glissando ou seuil de perception des variations tonales pour la parole. Phonetica, 23:1–33.
罗西,M. (1978a)。La perception des glissandos descendants dans les contours prosodiques。语音学, 35:11-40。
Rossi, M. (1978a). La perception des glissandos descendants dans les contours prosodiques. Phonetica, 35:11–40.
罗西,M. (1978b)。强度滑行和频率滑行的相互作用。语言和演讲, 21:384–396。
Rossi, M. (1978b). Interactions of intensity glides and frequency glissandos. Language and Speech, 21:384–396.
罗斯坦,E. (1996)。心灵的象征:音乐和数学的内在生活。纽约:雅芳图书。
Rothstein, E. (1996). Emblems of Mind: The Inner Life of Music and Mathematics. New York: Avon Books.
JA 罗素 (1989)。情绪的措施。载于:R. Plutchik 和 H. Kellerman(编辑),情感:理论研究和经验(第 4 卷,第 81-111 页)。纽约:学术出版社。
Russell, J. A. (1989). Measures of emotion. In: R. Plutchik & H. Kellerman (Eds.), Emotion: Theory Research and Experience (Vol. 4, pp. 81–111). New York: Academic Press.
Russolo, L. (1986)。噪音的艺术。纽约:彭德拉贡出版社。
Russolo, L. (1986). The Art of Noises. New York: Pendragon Press.
Rymer, R. (1993)。精灵:科学悲剧。纽约:Harper Perennial。
Rymer, R. (1993). Genie: A Scientific Tragedy. New York: Harper Perennial.
Sacks, O. (1984)。站立的腿。纽约:峰会图书。
Sacks, O. (1984). A Leg to Stand On. New York: Summit Books.
Sacks, O. (2007)。音乐爱好者:音乐与大脑的故事。纽约:克诺夫。
Sacks, O. (2007). Musicophilia: Tales of Music and the Brain. New York: Knopf.
Sadakata, M.、Ohgushi, K. 和 Desain, P. (2004)。简单节奏模式产生的跨文化比较研究。音乐心理学, 32:389–403。
Sadakata, M., Ohgushi, K., & Desain, P. (2004). A cross-cultural comparison study of the production of simple rhythmic patterns. Psychology of Music, 32:389–403.
Saffran, JR (2003)。婴儿期和成年期的绝对音高:音调结构的作用。发展科学, 6:37–45。
Saffran, J. R. (2003). Absolute pitch in infancy and adulthood: The role of tonal structure. Developmental Science, 6:37–45.
Saffran, JR、Aslin, RN 和 Newport, EL (1996)。8 个月大婴儿的统计学习。科学, 274:1926–1928。
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month old infants. Science, 274:1926–1928.
Saffran, JR, & Griepentrog, GJ (2001)。婴儿听觉学习中的绝对音高:发展重组的证据。发展心理学,
Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning: Evidence for developmental reorganization. Developmental Psychology,
37:74–85。
37:74–85.
Saffran, JR、Johnson, EK、Aslin, RN 和 Newport, EL (1999)。人类婴儿和成人对音调序列的统计学习。认知, 70:27-52。
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70:27–52.
Saffran, JR、Reeck, K.、Niebuhr, A. 和 Wilson, D. (2005)。改变音调:输入的结构影响婴儿对绝对和相对音高的使用。发展科学, 8:1-7。
Saffran, J. R., Reeck, K., Niebuhr, A., & Wilson, D. (2005). Changing the tune: The structure of the input affects infants’ use of absolute and relative pitch. Developmental Science, 8:1–7.
Sandler, W.、Meir, I.、Padden, C. 和 Aronoff, M. (2005)。语法的出现:新语言的系统结构。美国国家科学院院刊, 102:2661–2665。
Sandler, W., Meir, I., Padden, C., & Aronoff, M. (2005). The emergence of grammar: Systematic structure in a new language. Proceedings of the National Academy of Sciences, USA, 102:2661–2665.
Savage-Rumbaugh, S.、Shanker, SG 和 Taylor, TJ (1998)。类人猿、语言和人类思维。纽约:牛津。
Savage-Rumbaugh, S., Shanker, S. G., & Taylor, T. J. (1998). Apes, Language, and the Human Mind. New York: Oxford.
Schaefer, R.、Murre, J. 和 Bod, R. (2004)。限制简单旋律分割的普遍性。在:SD Lipscomb 等人。(编辑),第八届国际音乐感知和认知会议论文集,伊利诺伊州埃文斯顿(第 247-250 页)。澳大利亚阿德莱德:Causal Productions。
Schaefer, R., Murre, J., & Bod, R. (2004). Limits to universality in segmentation of simple melodies. In: S. D. Lipscomb et al. (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL (pp. 247–250). Adelaide, Australia: Causal Productions.
Schafer, AJ、Carter, J.、Clifton, C. 和 Frazier, L. (1996)。关注关系从句的解释。语言和认知过程, 11:135–163。
Schafer, A. J., Carter, J., Clifton, C., & Frazier, L. (1996). Focus in relative clause construal. Language and Cognitive Processes, 11:135–163.
Schaffrath, H. (1995)。单调的克恩格式的埃森民歌集, D. Huron(主编)。加利福尼亚州门洛帕克:人文计算机辅助研究中心。
Schaffrath, H. (1995). The Essen Folksong Collection in the Humdrum Kern Format, D. Huron (Ed.). Menlo Park, CA: Center for Computer Assisted Research in the Humanities.
EA 舍格洛夫 (1982)。作为互动成就的话语:“嗯嗯”的一些用法和句子之间的其他事物。载于:D. Tannen(主编),乔治敦大学语言学圆桌会议 1981。分析话语:文本和谈话(第 71-93 页)。华盛顿特区:乔治城大学出版社。
Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of “uh huh” and other things that come between sentences. In: D. Tannen (Ed.), Georgetown University Round Table on Linguistics 1981. Analysing Discourse:Text and Talk (pp. 71–93). Washington, DC: Georgetown University Press.
Scheirer, E., & Slaney, M. (1997)。Construction and evaluation of a robust multifeature speech/music discriminator,ICASSP-97 会议记录,德国慕尼黑(第 2 卷;第 1331-1334 页)。
Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator, Proceedings of ICASSP-97, Munich, Germany (Vol. 2; pp. 1331–1334).
EG 谢伦伯格 (1996)。旋律中的期望:蕴涵实现模型的测试。认知, 58:75–125。
Schellenberg, E. G. (1996). Expectancy in melody: Tests of the implication-realization model. Cognition, 58:75–125.
EG 谢伦伯格 (1997)。简化旋律期望的蕴涵实现模型。音乐感知, 14:295–318。
Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. Music Perception, 14:295–318.
Schellenberg, EG、Adachi, M.、Purdy, KT 和 McKinnon, MC (2002)。旋律中的期望:儿童和成人的测试。实验心理学杂志:综合, 131:511-537。
Schellenberg, E. G., Adachi, M., Purdy, K. T., & McKinnon, M. C. (2002). Expectancy in melody: Tests of children and adults. Journal of Experimental Psychology: General, 131:511–537.
Schellenberg, EG, & Trehub, SE (1996)。自然音乐间隔:来自婴儿听众的证据。心理科学, 7:272–277。
Schellenberg, E. G., & Trehub, S. E. (1996). Natural music intervals: Evidence from infant listeners. Psychological Science, 7:272–277.
Schellenberg, EG, & Trehub, SE (1999)。冗余,约定俗成,和音调序列的歧视:一个发展的观点。实验儿童心理学杂志, 74:107–127。
Schellenberg, E. G., & Trehub, S. E. (1999). Redundancy, conventionality, and the discrimination of tone sequences: A developmental perspective. Journal of Experimental Child Psychology, 74:107–127.
Schellenberg, EG, & Trehub, SE (2003)。良好的音高记忆力很普遍。心理科学, 14:262-266。
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological Science, 14:262–266.
申克,H.(1969 年)。五个图形音乐分析。纽约:多佛。
Schenker, H. (1969). Five Graphic Music Analyses. New York: Dover.
申克,H. (1979)。自由组合。纽约:朗文。
Schenker, H. (1979). Free Composition. New York: Longmans.
Scherer, KR (1986)。声音影响表达:回顾和未来研究的模型。心理公报, 99:143-165。
Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99:143–165.
Scherer, KR (1995)。在声音和音乐中表达情感。声音杂志, 9:235–248。
Scherer, K. R. (1995). Expression of emotion in voice and music. Journal of Voice,9:235–248.
Scherer, KR (2004)。音乐可以诱发哪些情绪?潜在的机制是什么?我们如何衡量它们?新音乐研究杂志, 33:239-251。
Scherer, K. R. (2004). Which emotions can be induced by music? What are the underlying mechanisms? And how can we measure them? Journal of New Music Research, 33:239–251.
Scherer, KR、Banse, R. 和 Wallbott, HG (2001)。声音表达的情感推论与不同语言和文化相关。跨文化心理学杂志, 32:76–92。
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotional inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32:76–92.
Schieffelin, B. (1985)。收购 Kaluli。载于:D. Slobin(主编),语言习得的跨语言研究(第 525-594 页)。新泽西州希尔斯代尔:Erlbaum。
Schieffelin, B. (1985). The acquisition of Kaluli. In: D. Slobin (Ed.), The Crosslinguistic Study of Language Acquisition (pp. 525–594). Hillsdale, NJ: Erlbaum.
Schlaug, G.、Jancke, L.、Huang, Y.、Staiger, JF 和 Steinmetz, H. (1995)。音乐家的胼胝体体积增加。神经心理学, 33:1047-55。
Schlaug, G., Jancke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia, 33:1047–55.
Schmidt, LA 和 Trainor, LJ (2001)。额叶脑电活动 (EEG) 可区分音乐情绪的效价和强度。认知与情感, 15:487–500。
Schmidt, L. A., & Trainor, L. J. (2001). Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition and Emotion, 15:487–500.
马萨诸塞州施穆克勒 (1989)。音乐中的期望:旋律和谐波过程的调查。音乐感知, 7:109–150。
Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, 7:109–150.
马萨诸塞州施穆克勒 (1999)。旋律轮廓相似性测试模型。音乐感知, 16:295–326。
Schmuckler, M. A. (1999). Testing models of melodic contour similarity. Music Perception, 16:295–326.
A. 勋伯格 (1911)。和谐。德国莱比锡:通用版。
Schoenberg, A. (1911). Harmonielehre. Leipzig, Germany: Universal Edition.
Schulkind, M.、Posner, RJ 和 Rubin, DC (2003)。有助于旋律识别的音乐特征:当他们最终播放它时,你怎么知道它是“你的”歌曲?音乐感知, 21:217–249。
Schulkind, M., Posner, R. J., & Rubin, D. C. (2003). Musical features that facilitate melody identification: How do you know its “your” song when they finally play it? Music Perception, 21:217–249.
舒尔茨,H.-H。(1989)。节奏模式的分类感知。心理学研究, 51:10-15。
Schulze, H.-H. (1989). Categorical perception of rhythmic patterns. Psychological Research, 51:10–15.
Schuppert, M.、Münte, TF、Wieringa, BM 和 Altenmüller, E. (2000)。接受性失乐:音乐处理策略下的跨半球神经网络的证据。大脑, 123, 546–559。
Schuppert, M., Münte, T. F., Wieringa, B. M., & Altenmüller, E. (2000). Receptive amusia: Evidence for cross-hemispheric neural networks underlying music processing strategies. Brain, 123, 546–559.
Schwartz, AB、Moran, DW 和 Reina, A. (2004)。额叶皮层中感知和动作的差异表征。科学, 303:380-383。
Schwartz, A. B., Moran, D. W., & Reina, A. (2004). Differential representation of perception and action in the frontal cortex. Science, 303:380–383.
Schwartz, DA、Howe, CQ 和 Purves, D. (2003)。人类语音的统计结构预测了音乐的普遍性。神经科学杂志, 23:7160–7168。
Schwartz, D. A., Howe, C. Q., & Purves, D. (2003). The statistical structure of human speech sounds predicts musical universals. Journal of Neuroscience, 23:7160–7168.
博士斯科特 (1982)。持续时间作为短语边界感知的提示。美国声学学会杂志, 71:996–1007。
Scott, D. R. (1982). Duration as a cue to the perception of a phrase boundary. Journal of the Acoustical Society of America, 71:996–1007.
Scott, DR、Isard, SD 和 Boysson-Bardies, B. de。(1985)。英语和法语的感知等时性。语音学杂志, 13:155-162。
Scott, D. R., Isard, S. D., & Boysson-Bardies, B. de. (1985). Perceptual isochrony in English and French. Journal of Phonetics, 13:155–162.
Scott, SK, & Johnsrude, IS (2003)。语音感知的神经解剖学和功能组织。神经科学趋势, 26:100–107。
Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neurosciences, 26:100–107.
海滨 (1938)。音乐心理学。纽约:麦格劳-希尔。
Seashore, C. (1938). Psychology of Music. New York: McGraw-Hill.
Sebeok, TA, & Umiker-Sebeok, DJ(编辑)。(1976)。语音代理:鼓和口哨系统(2 卷)。荷兰海牙:木桐。
Sebeok, T. A., & Umiker-Sebeok, D. J. (Eds.). (1976). Speech Surrogates: Drum and Whistle Systems (2 Vols.). The Hague, The Netherlands: Mouton.
A. 西格 (1987)。为什么 Suya 唱歌:亚马逊人的音乐人类学。英国剑桥:剑桥大学出版社。
Seeger, A. (1987). Why Suya Sing: A Musical Anthropology of an Amazonian People. Cambridge, UK: Cambridge University Press.
Seifritz, E.、Neuhoff, JG、Bilecen, D.、Scheffler, D.、Mustovic, H.、Schachinger, H.、Elefante, R. 和 Di Salle, F. (2002)。人脑中听觉“隐现”的神经处理。当前生物学, 12:2147–2151。
Seifritz, E., Neuhoff, J. G., Bilecen, D., Scheffler, D., Mustovic, H., Schachinger, H., Elefante, R., & Di Salle, F. (2002). Neural processing of auditory “looming” in the human brain. Current Biology, 12:2147–2151.
Selfridge-Feld, E. (1995)。埃森音乐数据包。CCARH(人文计算机辅助研究中心),第 1 号技术报告。加利福尼亚州门洛帕克:CCARCH。
Selfridge-Feld, E. (1995). The Essen Musical Data Package. CCARH (Center for Computer Assisted Research in the Humanities), Technical Report No. 1. Menlo Park, CA: CCARCH.
Selkirk, EO (1981)。关于语音表征的本质。In: J. Anderson, J. Laver, & T. Myers (Eds.), The Cognitive Representation of Speech. 阿姆斯特丹:北荷兰。
Selkirk, E. O. (1981). On the nature of phonological representation. In: J. Anderson, J. Laver, & T. Myers (Eds.), The Cognitive Representation of Speech. Amsterdam: North Holland.
Selkirk, EO (1984)。音韵学和句法:声音和结构之间的关系。马萨诸塞州剑桥市:麻省理工学院出版社。
Selkirk, E. O. (1984). Phonology and Syntax: The Relation Between Sound and Structure. Cambridge, MA: MIT Press.
Semal, C.、Demany, L.、Ueda, K. 和 Halle, PA (1996)。音高记忆中的语音与非语音。美国声学学会杂志, 100:1132–40。
Semal, C., Demany, L., Ueda, K., & Halle, P. A. (1996). Speech versus nonspeech in pitch memory. Journal of the Acoustical Society of America, 100:1132–40.
Semendeferi, K.、Lu, A.、Schenker, N. 和 Damasio, H. (2002)。人类和类人猿共享一个大的额叶皮层。自然神经科学五:272—276。
Semendeferi, K., Lu, A., Schenker, N., & Damasio, H. (2002). Humans and great apes share a large frontal cortex. Nature Neuroscience 5:272–276.
Senghas, A., & Coppola, M. (2001)。儿童创造语言:尼加拉瓜手语如何获得空间语法。心理科学, 12:323-328。
Senghas, A., & Coppola, M. (2001). Children creating language: How Nicaraguan Sign Language acquired a spatial grammar. Psychological Science, 12:323–328.
Senghas, A.、Kita, S. 和 Özyürek, A. (2004)。儿童创造语言的核心属性:来自尼加拉瓜新兴手语的证据。科学, 305:1779–1782。
Senghas, A., Kita, S., & Özyürek, A. (2004). Children creating core properties of language: Evidence from an emerging sign language in Nicaragua. Science, 305:1779–1782.
华盛顿州塞萨雷斯 (1999)。调音、音色、频谱、音阶。伦敦:施普林格。
Sethares, W. A. (1999). Tuning, Timbre, Spectrum, Scale. London: Springer.
Shamma, S., & Klein, D. (2000)。音调模板缺失的案例:谐波模板如何在早期听觉系统中出现。美国声学学会杂志, 107:2631–2644。
Shamma, S., & Klein, D. (2000). The case of the missing pitch templates: How harmonic templates emerge in the early auditory system. Journal of the Acoustical Society of America, 107:2631–2644.
Shamma, SA, Fleshman, JW, Wiser, PR, Versnel, H. (1993)。雪貂初级听觉皮层反应区域的组织。神经生理学杂志, 69:367–383。
Shamma, S. A., Fleshman, J. W., Wiser, P. R., Versnel, H. (1993). Organization of response areas in ferret primary auditory cortex. Journal of Neurophysiology, 69:367–383.
Shattuck-Hufnagel, S.、Ostendorf, M. 和 Ross, K. (1994)。美国英语词汇项目中的重音转移和早期音调重音位置。语音学杂志, 22:357-388。
Shattuck-Hufnagel, S., Ostendorf, M., & Ross, K. (1994). Stress shift and early pitch accent placement in lexical items in American English. Journal of Phonetics, 22:357–388.
Shattuck-Hufnagel, S., & Turk, AE (1996)。听觉句子处理研究者的韵律教程。心理语言学研究杂志, 25:193-247。
Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25:193–247.
Shenfield, T.、Trehub, SE 和 Nakata, T. (2003)。母亲的歌声调节婴儿的觉醒。音乐心理学, 31:365–375。
Shenfield, T., Trehub, S. E., & Nakata, T. (2003). Maternal singing modulates infants arousal. Psychology of Music, 31:365–375.
谢泼德,注册护士 (1982)。音调的结构表示。载于:D. Deutsch(主编) ,音乐心理学(第 343-390 页)。佛罗里达州奥兰多:学术出版社。
Shepard, R. N. (1982). Structural representations of musical pitch. In: D. Deutsch (Ed.), The Psychology of Music (pp. 343–390). Orlando, FL: Academic Press.
Shepard, RN, & Jordan, DS (1984)。听觉错觉表明音调被吸收到内在的音阶中。科学, 226:1333–1334。
Shepard, R. N., & Jordan, D. S. (1984). Auditory illusions demonstrating that tones are assimilated to an internalized musical scale. Science, 226:1333–1334.
Shields, JL、McHugh, A. 和 Martin, JG (1974)。对音素目标的反应时间作为连续语音中节奏提示的函数。实验心理学杂志, 102:250-255。
Shields, J. L., McHugh, A., & Martin, J. G. (1974). Reaction time to phoneme targets as a function of rhythmic cues in continuous speech. Journal of Experimental Psychology, 102:250–255.
Shriberg, E.、Ladd, DR、Terken, J. 和 Stolcke, A. (2002)。建模扬声器内部和扬声器之间的音高范围变化:预测“大声说话”时的 F0 目标。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Shriberg, E., Ladd, D. R., Terken, J., & Stolcke, A. (2002). Modeling pitch range variation within and across speakers: Predicting F0 targets when “speaking up.” In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Silk, JB, Alberts, SC 和 Altmann, J. (2003)。雌性狒狒的社会纽带提高了婴儿的存活率。科学, 302:1231-1234。
Silk, J. B., Alberts, S. C., & Altmann, J. (2003). Social bonds of female baboons enhance infant survival. Science, 302:1231–1234.
A. 歌手 (1974)。马其顿舞蹈的韵律结构。民族音乐学, 18:379–404。
Singer, A. (1974). The metrical structure of Macedonian dance. Ethnomusicology, 18:379–404.
Singh, L.、Morgan, JL 和 Best, CT (2002)。婴儿的听觉偏好:儿语还是开心语?婴儿期, 3:365–394。
Singh, L., Morgan, J. L., & Best, C. T. (2002). Infants’ listening preferences: Baby talk or happy talk? Infancy, 3:365–394.
斯莱特,PJB (2000)。鸟鸣曲目:它们的起源和用途。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 49-63 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Slater, P. J. B. (2000). Birdsong repertoires: Their origins and use. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 49–63). Cambridge, MA: MIT Press.
Slevc, LR, & Miyake, A. (2006)。第二语言能力的个体差异:音乐能力重要吗?心理科学, 17:675–681。
Slevc, L. R., & Miyake, A. (2006). Individual differences in second language proficiency: Does musical ability matter? Psychological Science, 17:675–681.
Slevc, LR, Rosenberg, JC, & Patel, AD (2009)。使心理语言学音乐化:自定进度的阅读时间证据,用于共同处理语言和音乐句法。Psychonomic Bulletin and Review, 16:374–381。
Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin and Review, 16:374–381.
斯洛博达,J. (1983)。钢琴演奏中的音律交流。实验心理学季刊, 35A:377–396。
Sloboda, J. (1983). The communication of musical metre in piano performance. Quarterly Journal of Experimental Psychology, 35A:377–396.
斯洛博达,J. (1985)。音乐头脑。英国牛津:牛津大学出版社。
Sloboda, J. (1985). The Musical Mind. Oxford, UK: Oxford University Press.
JA 斯洛博达 (1991)。音乐结构和情绪反应:一些实证研究结果。音乐心理学, 19:110–120。
Sloboda, J. A. (1991). Music structure and emotional response: Some empirical findings. Psychology of Music, 19:110–120.
JA 斯洛博达 (1998)。音乐有什么意义吗?科学音乐, 2:21–31。
Sloboda, J. A. (1998). Does music mean anything? Musicae Scientiae, 2:21–31.
JA 斯洛博达和 AH 格雷戈里 (1980)。音乐片段的心理现实。加拿大心理学杂志/Revue Canadienne de Psychologie, 34:274-280。
Sloboda, J. A., & Gregory, A. H. (1980). The psychological reality of musical segments. Canadian Journal of Psychology/Revue Canadienne de Psychologie, 34:274–280.
Sloboda, JA, & Juslin, PN (2001)。音乐和情感:评论。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 453-462 页)。英国牛津:牛津大学出版社。
Sloboda, J. A., & Juslin, P. N. (2001). Music and emotion: Commentary. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 453–462). Oxford, UK: Oxford University Press.
Sloboda, JA, & O'Neill, SA (2001)。日常听音乐的情绪。载于:PN Juslin 和 JA Sloboda(编辑),音乐与情感:理论与研究(第 415-429 页)。英国牛津:牛津大学出版社。
Sloboda, J. A., & O’Neill, S. A. (2001). Emotions in everyday listening to music. In: P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 415–429). Oxford, UK: Oxford University Press.
Sloboda, JA、O'Neill, SA 和 Ivaldi, A. (2001)。音乐在日常生活中的功能:一项使用经验抽样方法的探索性研究。科学音乐, 5:9–32。
Sloboda, J. A., O’Neill, S. A., & Ivaldi, A. (2001). Functions of music in everyday life: An exploratory study using the experience sampling methodology. Musicae Scientiae, 5:9–32.
JA 斯洛博达和 DHH 帕克 (1985)。立即回忆旋律。收录于:P. Howell、I. Cross 和 R. West(编),音乐结构和认知(第 143-167 页)。伦敦:学术出版社。
Sloboda, J. A., & Parker, D. H. H. (1985). Immediate recall of melodies. In: P. Howell, I. Cross, & R. West (Eds.), Musical Structure and Cognition (pp. 143–167). London: Academic Press.
Sloboda, JA, Wise, KJ, & Peretz, I. (2005)。量化一般人群中的音聋。纽约科学院年鉴, 1060:255-261。
Sloboda, J. A., Wise, K. J., & Peretz, I. (2005). Quantifying tone deafness in the general population. Annals of the New York Academy of Sciences, 1060: 255–261.
Sluijter, AJM, & van Heuven, VJ (1996)。频谱平衡作为语言压力的声学关联。美国声学学会杂志, 100:2471–2485。
Sluijter, A. J. M., & van Heuven, V. J. (1996). Spectral balance as an acoustic correlate of linguistic stress. Journal of the Acoustical Society of America, 100:2471–2485.
Sluijter, AJM, van Heuven, VJ, & Pacilly, JJA (1997)。频谱平衡作为语言压力感知的线索。美国声学学会杂志, 101:503–513。
Sluijter, A. J. M., van Heuven, V. J., & Pacilly, J. J. A. (1997). Spectral balance as a cue in the perception of linguistic stress. Journal of the Acoustical Society of America, 101:503–513.
Smiljanic, R., & Bradlow, AR (2005)。克罗地亚语和英语清晰语音的产生和感知。美国声学学会杂志, 118:1677–1688。
Smiljanic, R., & Bradlow, A. R. (2005). Production and perception of clear speech in Croatian and English. Journal of the Acoustical Society of America, 118:1677–1688.
JD 史密斯 (1997)。音乐新手在音乐科学中的地位。音乐感知, 14:227–262。
Smith, J. D. (1997). The place of musical novices in music science. Music Perception, 14:227–262.
Smith, JD、Kelmer Nelson, DG、Grohskopf, LA 和 Appleton, T. (1994)。这是一个什么样的孩子?那是什么间隔?新手听众熟悉的曲调和音乐感知。认知, 52:23-54。
Smith, J. D., Kelmer Nelson, D. G., Grohskopf, L. A., & Appleton, T. (1994). What child is this? What interval was that? Familiar tunes and music perception in novice listeners. Cognition, 52:23–54.
JD 史密斯和 RJ 梅拉拉 (1990)。音乐中的审美偏好和句法原型:“简单是天赋”。认知, 34:279-298。
Smith, J. D., & Melara, R. J. (1990). Aesthetic preference and syntactic prototypicality in music: ’Tis the gift to be simple. Cognition, 34:279–298.
Smith, N., & Cuddy, LL (2003)。对贝多芬华尔斯坦奏鸣曲音乐维度的感知:音高空间理论的应用。科学音乐, 7:7–34。
Smith, N., & Cuddy, L. L. (2003). Perceptions of musical dimensions in Beethoven’s Waldstein sonata: An application of Tonal Pitch Space theory. Musicae Scientiae, 7:7–34.
Snyder, J., & Krumhansl, CL (2001)。轻敲拉格泰姆:寻找脉搏的线索。音乐感知, 18:455–489。
Snyder, J., & Krumhansl, C. L. (2001). Tapping to ragtime: Cues to pulse finding. Music Perception, 18:455–489.
Snyder, JS, & Large, EW (2005)。伽马带活动反映了节奏音调序列的度量结构。认知脑研究, 24:117–126。
Snyder, J. S., & Large, E. W. (2005). Gamma-band activity reflects the metric structure of rhythmic tone sequences. Cognitive Brain Research, 24:117–126.
Sober, E. (1984)。选择的本质:哲学焦点中的进化论。马萨诸塞州剑桥市:麻省理工学院出版社。
Sober, E. (1984). The Nature of Selection: Evolutionary Theory in Philosophical Focus. Cambridge, MA: MIT Press.
Soto-Faraco, S.、Sebastián-Gallés, N. 和 Cutler, A. (2001)。词汇访问中的音段和超音段不匹配。记忆与语言杂志, 45:412-432。
Soto-Faraco, S., Sebastián-Gallés, N., & Cutler, A. (2001). Segmental and suprasegmental mismatch in lexical access. Journal of Memory and Language, 45:412–432.
Speer, SR、Warren, P. 和 Schafer, AJ(2003 年,8 月 3-9 日)。语调和句子处理。第十五届国际语音科学大会论文集,巴塞罗那。
Speer, S. R., Warren, P., & Schafer, A. J. (2003, August 3–9). Intonation and sentence processing. Proceedings of the Fifteenth International Congress of Phonetic Sciences, Barcelona.
H. 斯宾塞 (1857)。音乐的起源和功能。弗雷泽杂志, 56:396–408。
Spencer, H. (1857). The origin and function of music. Fraser’s Magazine, 56:396–408.
J. 斯蒂尔 (1779)。Prosodia Rationalis:或者,一篇关于建立语音旋律和尺度的文章,通过特殊符号来表达和延续(第 2 版)。伦敦:J. Nichols。(由 Georg Olms Verlag 转载,希尔德斯海姆,1971 年)
Steele, J. (1779). Prosodia Rationalis: Or, An Essay Toward Establishing the Melody and Measure of Speech, to Be Expressed and Perpetuated by Peculiar Symbols (2nd ed.). London: J. Nichols. (Reprinted by Georg Olms Verlag, Hildesheim, 1971)
Steinbeis, N., & Koelsch, S.(出版中)。音乐和语言之间的共享神经资源表明音乐张力消退模式的语义处理。大脑皮层。
Steinbeis, N., & Koelsch, S. (in press). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex.
Steinbeis, N.、Koelsch, S. 和 Sloboda, JA (2006)。谐波预期违背在音乐情绪中的作用:来自主观、生理和神经反应的证据。认知神经科学杂志, 18:1380–1393。
Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of harmonic expectancy violations in musical emotions: Evidence from subjective, physiological, and neural responses. Journal of Cognitive Neuroscience, 18:1380–1393.
Steinhauer, K.、Alter, K. 和 Friederici, AD (1999)。大脑潜能表明在自然语音处理中会立即使用韵律线索。自然神经科学, 2:191–196。
Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience, 2:191–196.
Steinhauer, K., & Friederici, AD (2001)。韵律边界、逗号规则和大脑反应:ERP 的闭合正向转变作为听众和读者韵律措辞的通用标记。心理语言学研究杂志, 30:267-295。
Steinhauer, K., & Friederici, A. D. (2001). Prosodic boundaries, comma rules, and brain responses: The closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers. Journal of Psycholinguistic Research, 30:267–295.
Steinke, WR、Cuddy, LL 和 Holden, RR (1997)。音乐调性和音调记忆与非音乐认知能力的分离。加拿大实验心理学杂志, 51:316–334。
Steinke, W. R., Cuddy, L. L., & Holden, R. R. (1997). Dissociation of musical tonality and pitch memory from nonmusical cognitive abilities. Canadian Journal of Experimental Psychology, 51:316–334.
Steinschneider, M.、Volkov, IO、Fishman, YI、Oya, H.、Arezzo, JC 和 Howard, MA (2005)。人类和猴子初级听觉皮层的皮质内反应支持语音起始时间语音参数编码的时间处理机制。大脑皮层, 15:170-186。
Steinschneider, M., Volkov, I. O., Fishman, Y. I., Oya, H., Arezzo, J. C., & Howard, M. A. (2005). Intracortical responses in human and monkey primary auditory cortex support a temporal processing mechanism for encoding of the voice onset time phonetic parameter. Cerebral Cortex, 15:170–186.
斯特恩,T.(1957 年)。鼓和口哨的“语言”:言语替代物的分析。美国人类学家, 59:487–506。
Stern, T. (1957). Drum and whistle “languages”: An analysis of speech surrogates. American Anthropologist, 59:487–506.
RH 斯泰森 (1951)。运动语音学,一项对行动中言语运动的研究。阿姆斯特丹:北荷兰。
Stetson, R. H. (1951). Motor Phonetics, a Study of Speech Movements in Action. Amsterdam: North Holland.
史蒂文斯 (2002)。约翰列侬之歌:披头士时代。波士顿:伯克利出版社。
Stevens, J. (2002). The Songs of John Lennon: The Beatles Years. Boston: Berklee Press.
堪萨斯州史蒂文斯 (1997)。发音-声学-听觉关系。载于:WJ Hardcastle & J. Laver(编辑),语音科学手册(第 463-506 页)。英国牛津:布莱克威尔。
Stevens, K. N. (1997). Articulatory-acoustic-auditory relationships. In: W. J. Hardcastle & J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 463–506). Oxford, UK: Blackwell.
堪萨斯州史蒂文斯 (1989)。关于语音的量子性质。语音学杂志, 17:3–45。
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17:3–45.
堪萨斯州史蒂文斯 (1998)。声学语音学。马萨诸塞州剑桥市:麻省理工学院出版社。
Stevens, K. N. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press.
Stevens, KN、Liberman, AM、Studdert-Kennedy, M. 和 Ohman, SEG (1969)。元音感知的跨语言研究。语言和言语, 12:1-23。
Stevens, K. N., Liberman, A. M., Studdert-Kennedy, M., & Ohman, S. E. G. (1969). Crosslanguage study of vowel perception. Language and Speech, 12:1–23.
Stewart, L.、Henson, R.、Kampe, K.、Walsh, V.、Turner, R. 和 Frith, U. (2003)。成为钢琴家:与学习阅读和演奏音乐相关的大脑变化。神经影像学, 20:71–83。
Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Becoming a pianist: Brain changes associated with learning to read and play music. NeuroImage, 20:71–83.
Stewart, L.、von Kriegstein, K.、Warren, JD 和 Griffiths, TD (2006)。音乐和大脑:音乐听力障碍。大脑, 129:2533–2553。
Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain: Disorders of musical listening. Brain, 129:2533–2553.
Stobart, H., & Cross, I. (2000)。安第斯山的 anacrusis?玻利维亚北波托西复活节歌曲中的节奏结构和感知。英国民族音乐学杂志, 9:63–94。
Stobart, H., & Cross, I. (2000). The Andean anacrusis? rhythmic structure and perception in Easter songs of Northern Potosi, Bolivia. British Journal of Ethnomusicology, 9:63–94.
Stoffer, TH (1985)。乐句结构在音乐感知中的表现。音乐感知, 3:191–220。
Stoffer, T. H. (1985). Representation of phrase structure in the perception of music. Music Perception, 3:191–220.
Stokes, M.(主编)。(1994)。种族、身份和音乐:地方的音乐建构。英国牛津:伯格。
Stokes, M. (Ed.). (1994). Ethnicity, Identity, and Music: The Musical Construction of Place. Oxford, UK: Berg.
斯通,RM(1982)。让内心变得甜蜜:利比里亚 Kpelle 音乐事件解读。布卢明顿:印第安纳大学出版社。
Stone, R. M. (1982). Let the Inside Be Sweet: The Interpretation of Music Event Among the Kpelle of Liberia. Bloomington: Indiana University Press.
Stratton, VN, & Zalanowski, AH (1994)。音乐与歌词的情感影响。艺术的实证研究, 12:173-184。
Stratton, V. N., & Zalanowski, A. H. (1994). Affective impact of music vs. lyrics. Empirical Studies of the Arts, 12:173–184.
洛杉矶斯特里特 (1978)。短语边界感知的声学决定因素。美国声学学会杂志, 64:1582–1592。
Streeter, L. A. (1978). Acoustic determinants of phrase boundary perception. Journal of the Acoustical Society of America, 64:1582–1592.
Strogatz, S. (2003)。同步:自发秩序的新兴科学。纽约:Hyperion。
Strogatz, S. (2003). Sync: The Emerging Science of Spontaneous Order. New York: Hyperion.
Stumpf, C. (1883)。Tonpsychologie(第 1 卷)。德国莱比锡:S. Hirzel。
Stumpf, C. (1883). Tonpsychologie (Vol. 1). Leipzig, Germany: S. Hirzel.
Sundberg, J. (1982)。言语、歌曲和情感。载于:M. Clynes(主编),《音乐、心灵和大脑:音乐的神经心理学》(第 137-149 页)。纽约:全会出版社。
Sundberg, J. (1982). Speech, song, and emotions. In: M. Clynes (Ed.), Music, Mind and Brain: The Neuropsychology of Music (pp. 137–149). New York: Plenum Press.
Sundberg, J. (1987)。歌声的科学。伊利诺伊州迪卡尔布:北伊利诺伊大学出版社。
Sundberg, J. (1987). The Science of the Singing Voice. DeKalb, IL: Northern Illinois University Press.
Sundberg, J. (1994)。音乐家在即兴废话文本演唱中音节选择的音乐意义:初步研究。语音学, 54:132–145。
Sundberg, J. (1994). Musical significance of musicians’ syllable choice in improvised nonsense text singing: A preliminary study. Phonetica, 54:132–145.
Sundberg, J., & Lindblom, B. (1976)。语言和音乐描述中的生成理论。认知, 4:99–122。
Sundberg, J., & Lindblom, B. (1976). Generative theories in language and music descriptions. Cognition, 4:99–122.
RA 萨顿 (2001)。亚洲/印度尼西亚。载于:JT Titon(Gen. Ed.),音乐世界:世界人民音乐简介(缩略版)(第 179-209 页)。加利福尼亚州贝尔蒙特:汤普森学习。
Sutton, R. A. (2001). Asia/Indonesia. In: J. T. Titon (Gen. Ed.), Worlds of Music: An Introduction to the Music of the World’s Peoples (Shorter Version) (pp. 179–209). Belmont, CA: Thompson Learning.
Swaab, TY、Brown, CM 和 Hagoort, P. (1998)。理解句子上下文中的歧义词:布罗卡失语症延迟上下文选择的电生理学证据。神经心理学, 36:737–761。
Swaab, T. Y., Brown, C. M., & Hagoort, P. (1998). Understanding ambiguous words in sentence contexts: Electrophysiological evidence for delayed contextual selection in Broca’s aphasia. Neuropsychologia, 36:737–761.
Swain, J. (1997)。音乐语言。纽约:诺顿。
Swain, J. (1997). Musical Languages. New York: Norton.
't Hart, J. (1976)。音调轮廓程式化的心理声学背景。IPO 年度进展报告, 11:11–19。
’t Hart, J. (1976). Psychoacoustic backgrounds of pitch contour stylization. I.P.O. Annual Progress Report, 11:11–19.
't Hart, J., & Collier, R. (1975)。集成不同级别的语调分析。语音学杂志, 3:235–255。
’t Hart, J., & Collier, R. (1975). Integrating different levels of intonation analysis. Journal of Phonetics, 3:235–255.
't Hart, J., Collier, R., & Cohen, A. (1990)。语调的感知研究:语音旋律的实验语音方法。英国剑桥:剑桥大学出版社。
’t Hart, J., Collier, R., & Cohen, A. (1990). A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge, UK: Cambridge University Press.
Tagg, P., & Clarida, B. (2003)。十首小调。纽约和蒙特利尔:大众传媒音乐学者出版社。
Tagg, P., & Clarida, B. (2003). Ten Little Tunes. New York and Montreal: The Mass Media Music Scholar’s Press.
高山,Y.,等。阿尔。(1993)。左侧中央前回病变致无失语外国口音综合征一例。神经病学, 43:1361–1363。
Takayama, Y., et. al. (1993). A case of foreign accent syndrome without aphasia caused by a lesion of the left precentral gyrus. Neurology, 43:1361–1363.
竹内,AH(1994 年)。最大键配置文件相关性 (MKC) 作为音乐中音调结构的度量。知觉与心理物理学, 56:335–346。
Takeuchi, A. H. (1994). Maximum key-profile correlation (MKC) as a measure of tonal structure in music. Perception and Psychophysics, 56:335–346.
Takeuchi, A., & Hulse, S. (1993)。绝对音高。心理公报, 113:345-61。
Takeuchi, A., & Hulse, S. (1993). Absolute pitch. Psychological Bulletin, 113:345–61.
Tallal, P., & Gaab, N. (2006)。动态听觉处理、音乐体验和语言发展。神经科学趋势, 29:382–370。
Tallal, P., & Gaab, N. (2006). Dynamic auditory processing, musical experience and language development. Trends in Neurosciences, 29:382–370.
Tan, N.、Aiello, R. 和 Bever, TG (1981)。和声结构作为旋律组织的决定因素。记忆与认知, 9:533–539。
Tan, N., Aiello, R., & Bever, T. G. (1981). Harmonic structure as a determinant of melodic organization. Memory and Cognition, 9:533–539.
Tan, S.-L., & Kelly, ME (2004)。短乐曲的图形表示。音乐心理学, 32:191-212。
Tan, S.-L., & Kelly, M. E. (2004). Graphic representations of short musical compositions. Psychology of Music, 32:191–212.
Tan, S.-L., & Spackman, MP (2005)。听众对结构改变和完整音乐作品的音乐统一性的判断。音乐心理学, 33:133-153。
Tan, S.-L., & Spackman, M. P. (2005). Listener’s judgments of musical unity of structurally altered and intact musical compositions. Psychology of Music, 33:133–153.
泰勒,DS (1981)。非母语人士和英语节奏。国际应用语言学评论, 19:219-226。
Taylor, D. S. (1981). Non-native speakers and the rhythm of English. International Review of Applied Linguistics, 19:219–226.
Tekman, HG, & Bharucha, JJ (1998)。和弦启动中的隐性知识与心理声学相似性。实验心理学杂志:人类感知和表现, 24:252-260。
Tekman, H. G., & Bharucha, J. J. (1998). Implicit knowledge versus psychoacoustic similarity in priming of chords. Journal of Experimental Psychology: Human Perception and Performance, 24:252–260.
Temperley, D. (1999)。摇滚中的切分音:一个感性的视角。流行音乐, 18:19–40。
Temperley, D. (1999). Syncopation in rock: A perceptual perspective. Popular Music, 18:19–40.
Temperley, D. (2000)。非洲音乐中的节拍和分组:音乐理论的观点。民族音乐学, 44:65–96。
Temperley, D. (2000). Meter and grouping in African music: A view from music theory. Ethnomusicology, 44:65–96.
Temperley, D. (2004)。交际压力和音乐风格的演变。音乐感知, 21:313–37。
Temperley, D. (2004). Communicative pressure and the evolution of musical styles. Music Perception, 21:313–37.
Temperley, D., & Bartlette, C. (2002)。平行度作为度量分析中的一个因素。音乐感知, 20:117–149。
Temperley, D., & Bartlette, C. (2002). Parallelism as a factor in metrical analysis. Music Perception, 20:117–149.
Teramitsu, I.、Kudo, LC、London, SE、Geschwind, DH 和 White, SA (2004)。FoxP1 和 FoxP2 在鸣禽和人脑中的平行表达预测功能相互作用。神经科学杂志, 24:3152–3163。
Teramitsu, I., Kudo, L. C., London, S. E., Geschwind, D. H., & White, S. A. (2004). Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. Journal of Neuroscience, 24:3152–3163.
E. 特哈特 (1984)。音乐和谐的概念:音乐和心理声学之间的联系。音乐感知一:276—295。
Terhardt, E. (1984). The concept of musical consonance: A link between music and psychoacoustics. Music Perception 1:276–295.
Terken, J. (1991)。重音音节的基频和感知突出度。美国声学学会杂志, 89:1768–1776。
Terken, J. (1991). Fundamental frequency and perceived prominence of accented syllables. Journal of the Acoustical Society of America, 89:1768–1776.
Terken, J., & Hermes, DJ (2000)。韵律突出的感知。载于:M. Horne(主编),韵律:理论与实验,提交给 Gosta Bruce 的研究(第 89-127 页)。荷兰多德雷赫特:Kluwer Academic Publishers。
Terken, J., & Hermes, D. J. (2000). The perception of prosodic prominence. In: M. Horne (Ed.), Prosody: Theory and Experiment, Studies Presented to Gosta Bruce (pp. 89–127). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Tervaniemi, M.、Kujala, A.、Alho, K.、Virtanen, J.、Ilmoniemi, RJ 和 Naatanen, R. (1999)。人类听觉皮层在处理语音和音乐声音方面的功能特化:一项脑磁图 (MEG) 研究。神经影像学, 9:330–336。
Tervaniemi, M., Kujala, A., Alho, K., Virtanen, J., Ilmoniemi, R. J., & Naatanen, R. (1999). Functional specialization of the human auditory cortex in processing phonetic and musical sounds: A magnetoencephalographic (MEG) study. NeuroImage, 9:330–336.
Tervaniemi, M.、Medvedev, SV、Alho, K.、Pakhomov, SV、Roudas, MS、van Zuijen, TL 和 Naatanen, R. (2000)。语音与音乐信息的横向自动听觉处理:一项 PET 研究。人脑绘图, 10:74–79。
Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., van Zuijen, T. L., & Naatanen, R. (2000). Lateralized automatic auditory processing of phonetic versus musical information: A PET study. Human Brain Mapping, 10:74–79.
Tervaniemi, M.、Szameitat, AJ、Kruck, S.、Schröger, E.、Alter, K.、De Baene, W. 和 Friederici, A.(2006 年)。从空气振荡到音乐和语音:微调神经网络在试听中的 fMRI 证据。神经科学杂志, 26:8647–8652。
Tervaniemi, M., Szameitat, A. J., Kruck, S., Schröger, E., Alter, K., De Baene, W., & Friederici, A. (2006). From air oscillations to music and speech: fMRI evidence for fine-tuned neural networks in audition. Journal of Neuroscience, 26:8647–8652.
Thaut, MH, Kenyon, GP, Schauer, ML, & McIntosh, GC (1999)。节律性和大脑功能之间的联系:对运动障碍治疗的影响。IEEE 工程生物学和医学汇刊, 18:101–108。
Thaut, M. H., Kenyon, G. P., Schauer, M. L., & McIntosh, G. C. (1999). The connection between rhythmicity and brain function: Implications for therapy of movement disorders. IEEE Transactions on Engineering Biology and Medicine, 18:101–108.
蒂埃里,E. (2002)。Les langages siffles。未发表的论文,Ecole Pratique des Hautes Etudes IV,巴黎。
Thierry, E. (2002). Les langages siffles. Unpublished dissertation, Ecole Pratique des Hautes Etudes IV, Paris.
托马森,JM(1983 年)。旋律重音:实验和暂定模型。美国声学学会杂志, 71:1596–1605。
Thomassen, J. M. (1983). Melodic accent: Experiments and a tentative model. Journal of the Acoustical Society of America, 71:1596–1605.
Thompson, AD, Jr., & Bakery, MC (1993)。雄性白冠麻雀对歌曲方言的识别:操纵歌曲成分的影响。秃鹰, 95:414-421。
Thompson, A. D., Jr., & Bakery, M. C. (1993). Song dialect recognition by male white-crowned sparrows: Effects of manipulated song components. The Condor, 95:414–421.
Thompson, WF, & Balkwill, L.-L. (2006)。解码五种语言的语音韵律。符号学, 158–1/4:407–424。
Thompson, W. F., & Balkwill, L.-L. (2006). Decoding speech prosody in five languages. Semiotica, 158–1/4:407–424.
Thompson, WF, & Cuddy, LL (1992)。感知四声部和声和单声部的音调移动。音乐感知, 9, 427–438。
Thompson, W. F., & Cuddy, L. L. (1992). Perceived key movement in four-voice harmony and single voices. Music Perception, 9, 427–438.
Thompson, WF, & Russo, FA (2004)。歌曲歌词的意义和情感归因。Polskie Forum Psychologiczne, 9, 51–62.
Thompson, W. F., & Russo, F. A. (2004). The attribution of meaning and emotion to song lyrics. Polskie Forum Psychologiczne, 9, 51–62.
Thompson, WF、Schellenberg, EG 和 Husain, G. (2004)。解码语音韵律:音乐课有帮助吗?情感, 4:46-64。
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion, 4:46–64.
Thorsen, N. (1980)。句子语调感知的研究:来自丹麦语的证据。美国声学学会杂志, 67:1014–1030。
Thorsen, N. (1980). A study of the perception of sentence intonation: Evidence from Danish. Journal of the Acoustical Society of America, 67:1014–1030.
Tillmann, B. (2005)。非音乐家听众对音调知识的内隐调查。纽约科学院年鉴, 1060:100–110。
Tillmann, B. (2005). Implicit investigations of tonal knowledge in nonmusician listeners. Annals of the New York Academy of Sciences, 1060:100–110.
Tillmann, B.、Bharucha, JJ 和 Bigand, E. (2000)。调性的内隐学习:一种自组织方法。心理评论, 107:885-913。
Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-organizing approach. Psychological Review, 107:885–913.
Tillmann, B., & Bigand, E. (1996)。正式的音乐结构会影响对音乐表现力的感知吗?音乐心理学, 24:3-17。
Tillmann, B., & Bigand, E. (1996). Does formal musical structure affect perception of musical expressiveness? Psychology of Music, 24: 3–17.
Tillmann, B., & Bigand, E. (2001)。正常和加扰音乐序列中的全局上下文效果。实验心理学杂志:人类感知和表现, 27:1185–1196。
Tillmann, B., & Bigand, E. (2001). Global context effects in normal and scrambled musical sequences. Journal of Experimental Psychology: Human Perception and Performance, 27:1185–1196.
Tillmann, B.、Bigand, E. 和 Pineau, M. (1998)。本地和全球环境对谐波预期的影响,音乐感知, 16:99–118。
Tillmann, B., Bigand, E., & Pineau, M. (1998). Effect of local and global contexts on harmonic expectancy, Music Perception, 16:99–118.
Tillmann, B.、Janata, P. 和 Bharucha, JJ (2003)。音乐启动中下额叶皮层的激活。认知脑研究, 16:145–161。
Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Cognitive Brain Research, 16:145–161.
Titon, JT(主编)。(1996)。音乐世界:世界人民的音乐介绍(第 3 版)。纽约:希尔默。托德,NP 麦卡。(1985)。音调音乐中表达时间的模型。音乐感知, 3:33–58。
Titon, J. T. (Ed.). (1996). Worlds of Music: An Introduction to the Music of the World’s Peoples (3rd ed.). New York: Schirmer. Todd, N. P. McA. (1985). A model of expressive timing in tonal music. Music Perception, 3:33–58.
托德,NP Mc-A。(1999)。音乐中的运动:神经生物学的观点。音乐感知, 17:115–126。
Todd, N. P. Mc-A. (1999). Motion in music: A neurobiological perspective. Music Perception, 17:115–126.
Todd, NP McA.、O'Boyle, DJ 和 Lee, CS (1999)。节奏、时间知觉和节拍感应的感觉运动理论。新音乐研究杂志, 28:5-28。
Todd, N. P. McA., O’Boyle, D. J., & Lee, C. S. (1999). A sensory-motor theory of rhythm, time perception and beat induction. Journal of New Music Research, 28:5–28.
Toga, AW、Thompson, PM 和 Sowell, ER (2006)。绘制大脑成熟图。神经科学趋势, 29, 148–159。
Toga, A. W., Thompson, P. M., & Sowell, E. R. (2006). Mapping brain maturation. Trends in Neurosciences, 29, 148–159.
Toiviainen, P., & Eerola, T. (2003)。节拍在哪里?芬兰和南非听众的比较。载于:R. Kopiez、AC Lehmann、I. Wolther 和 C. Wolf(编),第五届三年一度的 Escom 会议论文集(第 501-504 页)。德国汉诺威:汉诺威音乐戏剧大学。
Toiviainen, P., & Eerola, T. (2003). Where is the beat? Comparison of Finnish and South African listeners. In: R. Kopiez, A. C. Lehmann, I. Wolther, & C. Wolf (Eds.), Proceedings of the 5th Triennial Escom Conference (pp. 501–504). Hanover, Germany: Hanover University of Music and Drama.
Toiviainen, P., & Krumhansl, CL (2003)。测量和建模对音乐的实时响应:音调感应的动态。感知, 32:741-766。
Toiviainen, P., & Krumhansl, C. L. (2003). Measuring and modeling real-time responses to music: The dynamics of tonality induction. Perception, 32:741–766.
Toiviainen, P., & Snyder, JS (2003)。轻敲巴赫:基于共振的脉冲建模。音乐感知, 21:43–80。
Toiviainen, P., & Snyder, J. S. (2003). Tapping to Bach: Resonance-based modeling of pulse. Music Perception, 21:43–80.
Tojo, S.、Oka, Y. 和 Nishida, M. (2006)。通过 HPSG 分析和弦进行。在:第 24 届 IASTED 国际多方会议论文集(人工智能和应用),第 305-310 页。奥地利因斯布鲁克。
Tojo, S., Oka, Y., & Nishida, M. (2006). Analysis of chord progression by HPSG. In: Proceedings of the 24th IASTED International Multi-Conference (Artificial Intelligence and Applications), pp. 305–310. Innsbruck, Austria.
托马塞洛,M. (1995)。语言不是一种本能。认知发展, 10:131–156。
Tomasello, M. (1995). Language is not an instinct. Cognitive Development, 10:131–156.
托马塞洛,M. (2003)。关于符号和语法的不同起源。载于:MH Christiansen & S. Kirby(编辑),语言进化(第 94-110 页)。英国牛津:牛津大学出版社。
Tomasello, M. (2003). On the different origins of symbols and grammar. In: M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 94–110). Oxford, UK: Oxford University Press.
Tomasello, M.、Carpenter, M.、Call, J.、Behne, T. 和 Moll, H. (2005)。理解和分享意图:文化认知的起源。行为与脑科学, 28:675–691。
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28: 675–691.
Trail, A. (1994)。一个 !Xóõ 词典。德国科隆:Rüdiger Köppe。
Trail, A. (1994). A !Xóõ Dictionary. Cologne, Germany: Rüdiger Köppe.
Trainor, LJ (2005)。音乐发展有关键期吗?发育心理生物学, 46:262-278。
Trainor, L. J. (2005). Are there critical periods for music development? Developmental Psychobiology, 46: 262–278.
Trainor, LJ, & Adams, B. (2000)。婴儿和成人在音调模式分割中使用持续时间和强度提示。感知与心理物理学, 62:333-340。
Trainor, L. J., & Adams, B. (2000). Infants’ and adults’ use of duration and intensity cues in the segmentation of tone patterns. Perception and Psychophysics, 62:333–340.
Trainor, LJ, Austin, CM, & Desjardins, RN (2000)。以婴儿为导向的言语韵律是情感的声音表达的结果吗?心理科学, 11:188195。
Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Science, 11:188195.
Trainor, LJ, Clark, ED, Huntley, A., & Adams, BA (1997)。婴儿定向歌唱偏好的声学基础。婴儿行为与发展, 20:383396。
Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of preferences for infant-directed singing. Infant Behavior and Development, 20:383396.
Trainor, LJ, & Heinmiller, BM (1998)。对音乐的评价反应的发展:婴儿更喜欢听和谐而不是不和谐。婴儿行为和发展, 21:77–88。
Trainor, L. J., & Heinmiller, B. M. (1998). The development of evaluative responses to music: Infants prefer to listen to consonance over dissonance. Infant Behavior and Development, 21:77–88.
Trainor, LJ, McDonald, KL, & Alain, C. (2002)。通过脑电活动测量的旋律轮廓和音程信息的自动和受控处理。认知神经科学杂志, 14:430-442。
Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and controlled processing of melodic contour and interval information measured by electrical brain activity. Journal of Cognitive Neuroscience, 14:430–442.
Trainor, LJ、McFadden, M.、Hodgson, L.、Darragh, L.、Barlow, J. 等。(2003)。2 至 6 个月大时听觉皮层的变化和错配消极性的发展。国际心理生理学杂志, 51:5–15。
Trainor, L. J., McFadden, M., Hodgson, L., Darragh, L., Barlow, J., et al. (2003). Changes in auditory cortex and the development of mismatch negativity between 2 and 6 months of age. International Journal of Psychophysiology, 51:5–15.
Trainor, LJ, & Schmidt, LA (2003)。处理由音乐引起的情绪。在:I. Peretz & R. Zatorre,(编辑),音乐的认知神经科学(第 310-324 页)。英国牛津:牛津大学出版社。
Trainor, L. J., & Schmidt, L. A. (2003). Processing Emotions Induced by Music. In: I. Peretz & R. Zatorre, (Eds.), The Cognitive Neuroscience of Music (pp. 310–324). Oxford, UK: Oxford University Press.
Trainor, LJ, & Trehub, SE (1992)。婴儿和成人对西方音乐结构的敏感性比较。实验心理学杂志:人类感知和表现, 18:394-402。
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants’ and adults’ sensitivity to Western musical structure. Journal of Experimental Psychology: Human Perception and Performance, 18:394–402.
Trainor, LJ, & Trehub, SE (1993)。婴儿和成人的音乐背景效应:关键距离。实验心理学杂志:人类感知和表现, 19:615-26。
Trainor, L. J., & Trehub, S. E. (1993). Musical context effects in infants and adults: Key distance. Journal of Experimental Psychology: Human Perception and Performance, 19:615–26.
Trainor, LJ, & Trehub, SE (1994)。西方调性音乐的主要成员和隐含的和谐:发展观点。知觉与心理物理学, 56:125–132。
Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony in Western tonal music: Developmental perspectives. Perception and Psychophysics, 56:125–132.
Trainor, LJ, Tsang, CD, & Cheung, VHW (2002)。2 个月大的婴儿对辅音的偏好。音乐感知, 20:185–192。
Trainor, L. J., Tsang, C. D., & Cheung, V. H. W. (2002). Preference for consonance in 2-month-old infants. Music Perception, 20:185–192.
Tramo, MJ, Bharucha, JJ, & Musiek, FE (1990)。双侧听觉皮层损伤后的音乐感知和认知。认知神经科学杂志, 2:195–212。
Tramo, M. J., Bharucha, J. J., & Musiek, F. E. (1990). Music perception and cognition following bilateral lesions of auditory cortex. Journal of Cognitive Neuroscience, 2:195–212.
Tramo, MJ、Cariani, PA、Delgutte, B. 和 Braida, LD (2003)。和谐感知的神经生物学。在:I. Peretz & R. Zatorre(编辑),音乐的认知神经科学(第 127-151 页)。纽约:牛津。
Tramo, M. J., Cariani, P. A., Delgutte, B., & Braida, L. D. (2003). Neurobiology of harmony perception. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music (pp. 127–151). New York: Oxford.
Trehub, SE (2000)。人类处理倾向和音乐普遍性。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 427-448 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Trehub, S. E. (2000). Human processing predispositions and musical universals. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 427–448). Cambridge, MA: MIT Press.
Trehub, SE (2003a)。音乐性的发展起源。自然神经科学, 6:669–673。
Trehub, S. E. (2003a). The developmental origins of musicality. Nature Neuroscience, 6:669–673.
Trehub, SE (2003b)。婴儿时期的音乐倾向:更新。在:I. Peretz & R. Zatorre(编辑),音乐的认知神经科学(第 3-20 页)。纽约:牛津大学出版社。
Trehub, S. E. (2003b). Musical predispositions in infancy: An update. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music (pp. 3–20). New York: Oxford University Press.
Trehub, SE, Bull, D., & Thorpe, LA (1984)。婴儿对旋律的感知:旋律轮廓的作用。儿童发展, 55:821–830。
Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants’ perception of melodies: The role of melodic contour. Child Development, 55:821–830.
Trehub, SE, & Hannon, EE (2006)。婴儿音乐感知:一般领域还是特定领域的机制?认知,100:73-99。
Trehub, S. E., & Hannon, E. E. (2006). Infant music perception: Domain-general or domain-specific mechanisms? Cognition,100:73–99.
Trehub, SE, Morrongiello, BA, & Thorpe, LA (1985)。儿童对熟悉旋律的感知:音程、轮廓和音调的作用。心理音乐学, 5:39–48。
Trehub, S. E., Morrongiello, B. A., & Thorpe, L. A. (1985). Children’s perception of familiar melodies: The role of intervals, contour, and key. Psychomusicology, 5:39–48.
Trehub, SE, Schellenberg, EG, & Hill, D. (1997)。音乐感知和认知的起源:发展的观点。在:I. Deliege & JA Sloboda(编辑),音乐的感知和认知(第 103-128 页)。英国霍夫:心理学出版社。
Trehub, S. E., Schellenberg, E. G., & Hill, D. (1997). The origins of music perception and cognition: A developmental perspective. In: I. Deliege & J. A. Sloboda (Eds.), Perception and Cognition of Music (pp. 103–128). Hove, UK: Psychology Press.
Trehub, SE, Schellenberg, EG, & Kamenetsky, SB (1999)。婴儿和成人对量表结构的感知。实验心理学杂志:人类感知和表现, 25:965–975。
Trehub, S. E., Schellenberg, E. G., & Kamenetsky, S. B. (1999). Infants’ and adults’ perception of scale structure. Journal of Experimental Psychology: Human Perception and Performance, 25:965–975.
Trehub, SE, & Thorpe, LA (1989)。婴儿对节奏的感知:按时间结构对听觉序列进行分类。加拿大心理学杂志, 43:217–229。
Trehub, S. E., & Thorpe, L. A. (1989). Infant’s perception of rhythm: Categorization of auditory sequences by temporal structure. Canadian Journal of Psychology, 43:217–229.
Trehub, SE, Thorpe, LA, & Morrongiello, BA (1987)。婴儿听觉模式感知的组织过程。儿童发展, 58:741-9。
Trehub, S. E., Thorpe, L. A., & Morrongiello, B. A. (1987). Organizational processes in infants’ perception of auditory patterns. Child Development, 58:741–9.
Trehub, SE, & Trainor, LJ (1998)。给婴儿唱歌:摇篮曲和戏剧歌曲。婴儿期研究进展, 12:43–77。
Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: Lullabies and playsongs. Advances in Infancy Research, 12:43–77.
Trehub、SE、Unyk、AM、Kamenetsky、SB、Hill、DS、Trainor、LJ、Henderson、JL 等。(1997)。母亲和父亲对婴儿的歌声。发展心理学, 33:500-507。
Trehub, S. E., Unyk, A. M., Kamenetsky, S. B., Hill, D. S., Trainor, L. J., Henderson, J. L., et al. (1997). Mothers’ and fathers’ singing to infants. Developmental Psychology, 33:500–507.
Trehub, SE, Unyk, AM, & Trainor, LJ (1993)。跨文化视角下的母性歌唱。婴儿行为和发展, 16:285–95。
Trehub, S. E., Unyk, A. M., & Trainor, L. J. (1993). Maternal singing in cross-cultural perspective. Infant Behavior and Development, 16:285–95.
Trevarthen, C. (1999)。音乐性和内在动机脉冲:来自人类心理生物学和婴儿交流的证据 [特刊 1999-2000]。科学音乐, 155–215。
Trevarthen, C. (1999). Musicality and the intrinsic motivic pulse: Evidence from human psychobiology and infant communication [Special issue 1999–2000]. Musicae Scientiae, 155–215.
Tronick, EZ、Als, H.、Adamson, L.、Wise, S. 和 Brazelton, TB (1978)。婴儿对面对面互动中相互矛盾的信息之间的陷阱的反应。美国儿童精神病学学会杂志, 17:1-13。
Tronick, E. Z., Als, H., Adamson, L., Wise, S., & Brazelton, T. B. (1978). The infant’s response to entrapment between contradictory messages in face-to-face interaction. Journal of the American Academy of Child Psychiatry, 17:1–13.
鳟鱼,JD(2003 年)。语音的生物学专业化:动物能告诉我们什么?心理科学的当前方向, 12:155-159。
Trout, J. D. (2003). Biological specializations for speech: What can animals tell us? Current Directions in Psychological Science, 12:155–159.
Tsukada, K. (1997)。赞比亚 Luvale 人的鼓乐、拟声词和声音象征。载于:J. Kawada(主编),Cultures Sonores DAfrique(第 349-393 页)。
Tsukada, K. (1997). Drumming, onomatopoeia and sound symbolism among the Luvale of Zambia. In: J. Kawada (Ed.), Cultures Sonores DAfrique (pp. 349–393).
东京:亚洲和非洲语言文化研究所 (ILCAA)。
Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa (ILCAA).
Tyack, PL, & Clark, CW (2000)。鲸鱼和海豚的交流和听觉行为。载于:WWL Au、AN Popper 和 RR Fay(编),鲸鱼和海豚的听力(第 156-224 页)。纽约:施普林格。
Tyack, P. L., & Clark, C. W. (2000). Communication and acoustic behavior in whales and dolphins. In: W. W. L. Au, A. N. Popper, & R. R. Fay (Eds.), Hearing by Whales and Dolphins (pp. 156–224). New York: Springer.
Tzortzis, C.、Goldblum, M.-C.、Dang, M.、Forette, F. 和 Boller, F. (2000)。失语症作曲家缺乏音乐并保留了乐器的命名。皮质,三十六:227—242。
Tzortzis, C., Goldblum, M.-C., Dang, M., Forette, F., & Boller, F. (2000). Absence of amusia and preserved naming of musical instruments in an aphasic composer. Cortex, 36:227–242.
Ulanovsky, N.、Las, L. 和 Nelken, I. (2003)。皮质神经元对低概率声音的处理。自然神经科学, 6:391–398。
Ulanovsky, N., Las, L., & Nelken, I. (2003). Processing of low-probability sounds by cortical neurons. Nature Neuroscience, 6:391–398.
乌尔曼,MT (2001)。语言的神经认知观点:声明/程序模型。自然评论(神经科学), 2:717–726。
Ullman, M. T. (2001). A neurocognitive perspective on language: The declarative/procedural model. Nature Reviews (Neuroscience), 2:717–726.
N. 梅田 (1982)。F0 偏角视情况而定。语音学杂志, 20:279-290。
Umeda, N. (1982). F0 declination is situation dependent. Journal of Phonetics, 20:279–290.
Unyk, AM, Trehub, SE, Trainor, LJ, & Schellenberg, EG (1992)。摇篮曲和简单:跨文化视角。音乐心理学, 20:15–28。
Unyk, A. M., Trehub, S. E., Trainor, L. J., & Schellenberg, E. G. (1992). Lullabies and simplicity: A cross-cultural perspective. Psychology of Music, 20:15–28.
Vaissiere, J. (1983)。与语言无关的韵律特征。载于:A. Cutler & DR Ladd(编辑),韵律:模型和测量(第 53-66 页)。德国柏林:施普林格。
Vaissiere, J. (1983). Language-independent prosodic features. In: A. Cutler & D. R. Ladd (Eds.), Prosody: Models and Measurements (pp. 53–66). Berlin, Germany: Springer.
van de Weijer, J. (1998)。Word Discovery 的语言输入。奈梅亨:马克斯普朗克心理语言学研究所。
van de Weijer, J. (1998). Language Input for Word Discovery. Nijmegen: Max Planck Institute for Psycholinguistics.
RH 范古力克 (1940)。中国琵琶的传说:中国意识形态论文。东京:上智大学。
van Gulik, R. H. (1940). The lore of the Chinese lute: An essay in ch’in ideology. Tokyo:Sophia University.
Van Khê, T. (1977)。五声音阶是万能的吗?对五声调的几点思考。音乐世界, 19:76–91。
Van Khê, T. (1977). Is the pentatonic universal? A few reflections on pentatonism. The World of Music, 19:76–91.
van Noorden, L., & Moelants, D. (1999)。音乐脉搏感知中的共振。新音乐研究杂志, 28, 43–66。
van Noorden, L., & Moelants, D. (1999). Resonance in the perception of musical pulse. Journal of New Music Research, 28, 43–66.
van Ooyen, B.、Bertoncini, J.、Sansavini, A. 和 Mehler, J. (1997)。弱音节对新生儿有影响吗?美国声学学会杂志, 102:37353741。
van Ooyen, B., Bertoncini, J., Sansavini, A., & Mehler, J. (1997). Do weak syllables count for newborns? Journal of the Acoustical Society of America, 102:37353741.
Van Valin, RD (2001)。语法介绍。英国剑桥:剑桥大学出版社。
Van Valin, R. D. (2001). An Introduction to Syntax. Cambridge, UK: Cambridge University Press.
Vanhaeren, M.、d'Errico, F.、Stringer, C.、James, SL、Todd, JA 和 Mienis, HK (2006)。以色列和阿尔及利亚的旧石器时代中期贝壳珠。科学, 312:17851788。
Vanhaeren, M., d’Errico, F., Stringer, C., James, S. L., Todd, J. A., & Mienis, H. K. (2006). Middle Paleolithic shell beads in Israel and Algeria. Science, 312:17851788.
Vargha-Khadem, F.、Watkins, K.、Alcock, K.、Fletcher, P. 和 Passingham, R. (1995)。患有遗传性言语和语言障碍的大家庭中的实际和非语言认知缺陷。美国国家科学院院刊, 92:930–933。
Vargha-Khadem, F., Watkins, K., Alcock, K., Fletcher, P., & Passingham, R. (1995). Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences, USA, 92:930–933.
Vargha-Khadem, F.、Watkins, KE、Price、CJ、Ashburner, J.、Alcock, KJ、Connely, A. 等。(1998)。遗传性言语和语言障碍的神经基础。美国国家科学院院刊, 95:12695–12700。
Vargha-Khadem, F., Watkins, K. E., Price, C. J., Ashburner, J., Alcock, K. J., Connely, A., et al. (1998). Neural basis of an inherited speech and language disorder. Proceedings of the National Academy of Sciences, USA, 95:12695–12700.
Vasishth, S., & Lewis, RL (2006)。参数头距离和处理复杂性:解释局部性和反局部性效应。语言, 82:767–794。
Vasishth, S., & Lewis, R. L. (2006). Argument-head distance and processing complexity: Explaining both locality and anti-locality effects. Language, 82:767–794.
Vassilakis, P. (2005)。听觉粗糙度作为音乐表达的手段。民族音乐学精选报告, 12:119–144。
Vassilakis, P. (2005). Auditory roughness as means of musical expression. Selected Reports in Ethnomusicology, 12:119–144.
Voisin, F. (1994)。中非和爪哇的音阶:综合建模。莱昂纳多音乐杂志, 4:85–90。
Voisin, F. (1994). Musical scales in central Africa and Java: Modeling by synthesis. Leonardo Music Journal, 4:85–90.
von Hippel, P., & Huron, D. (2000)。为什么跳过先于反转?tessitura 对旋律结构的影响。音乐感知, 18:59–85。
von Hippel, P., & Huron, D. (2000). Why do skips precede reversals? The effect of tessitura on melodic structure. Music Perception, 18:59–85.
von Steinbuchel, N. (1998)。中枢神经处理的时间范围:临床证据。实验脑研究, 123:220-233。
von Steinbuchel, N. (1998). Temporal ranges of central nervous processing: Clinical evidence. Experimental Brain Research, 123:220–233.
Vos, PG, & Troost, JM (1989)。上升和下降的旋律间隔:统计结果及其感知相关性。音乐感知, 6:383–396。
Vos, P. G., & Troost, J. M. (1989). Ascending and descending melodic intervals: Statistical findings and their perceptual relevance. Music Perception, 6:383–396.
Vouloumanos, A., & Werker, JF (2004)。调到信号:小婴儿说话的特权地位。发展科学, 7:270–276。
Vouloumanos, A., & Werker, J. F. (2004). Tuned to the signal: The privileged status of speech for young infants. Developmental Science, 7:270–276.
Wagner, PS, & Dellwo, V. (2004)。引入 YARD(另一种节奏测定)并将等时性重新引入节奏研究。2004 年演讲韵律学论文集,日本奈良。
Wagner, P. S., & Dellwo, V. (2004). Introducing YARD (Yet Another Rhythm Determination) and re-introducing isochrony to rhythm research. Proceedings of Speech Prosody 2004, Nara, Japan.
Walker, R. (1990). Musical Beliefs. New York: Teachers College Press.
Wallin, NL、Merker, B. 和 Brown, S.(编辑)。(2000)。音乐的起源。马萨诸塞州剑桥市:麻省理工学院出版社。
Wallin, N. L., Merker, B., & Brown, S. (Eds.). (2000). The Origins of Music. Cambridge, MA: MIT Press.
WD 沃德 (1999)。绝对音高。载于:D. Deutsch(主编),音乐心理学(第 2 版,第 265-298 页)。加利福尼亚州圣地亚哥:学术出版社。
Ward, W. D. (1999). Absolute pitch. In: D. Deutsch (Ed.), The Psychology of Music (2nd ed., pp. 265–298). San Diego, CA: Academic Press.
Warren, T., & Gibson, E. (2002)。指称加工对句子复杂性的影响。认知,85, 79–112。
Warren, T., & Gibson, E. (2002). The influence of referential processing on sentence complexity. Cognition, 85, 79–112.
Watanabe, S., & Nemoto, M. (1998)。增强爪哇麻雀(Padda oryzivora) 的音乐特性。行为过程, 43:211-218。
Watanabe, S., & Nemoto, M. (1998). Reinforcing property of music in Java sparrows (Padda oryzivora). Behavioural Processes, 43:211–218.
Watkins, KE, Dronkers, NF, & Vargha-Khadem, F. (2002)。遗传性言语障碍的行为分析:与获得性失语症的比较。大脑, 125:452–464。
Watkins, K. E., Dronkers, N. F., & Vargha-Khadem, F. (2002). Behavioural analysis of an inherited speech and language disorder: Comparison with acquired aphasia. Brain, 125:452–464.
D. 沃森和 E. 吉布森 (2004)。语言产生中语调短语与句法结构的关系。语言和认知过程, 19:713–755。
Watson, D., & Gibson, E. (2004). The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes, 19:713–755.
瓦特、RJ 和 Ash, RL (1998)。对音乐意义的心理学调查。科学音乐, 2:33–53。
Watt, R. J., & Ash, R. L. (1998). A psychological investigation of meaning in music. Musicae Scientiae, 2:33–53.
Webb, DM, & Zhang, J. (2005)。学习歌曲的鸟类和学习声音的哺乳动物中的FoxP2 。遗传杂志, 96:212-216。
Webb, D. M., & Zhang, J. (2005). FoxP2 in song-learning birds and vocal-learning mammals. Journal of Heredity, 96:212–216.
K. 韦德金 (1983)。埃塞俄比亚的一种六音语言。埃塞俄比亚研究杂志, 16:129-156。
Wedekind, K. (1983). A six-tone language in Ethiopia. Journal of Ethopian Studies, 16:129–156.
Wedekind, K. (1985)。绘制音调语言地图时的想法。Afrikanistische Arbeitspapiere, 1:105–124。
Wedekind, K. (1985). Thoughts when drawing a map of tone languages. Afrikanistische Arbeitspapiere, 1:105–124.
Weisman, RG、Njegovan, MG、Williams, MT、Cohen, JS 和 Sturdy, CB (2004)。绝对音高的行为分析:性别、经验和物种。行为过程, 66:289–307。
Weisman, R. G., Njegovan, M. G., Williams, M. T., Cohen, J. S., & Sturdy, C. B. (2004). A behavior analysis of absolute pitch: Sex, experience, and species. Behavioural Processes, 66:289–307.
Welmers, WE (1973)。非洲语言结构。伯克利:加州大学出版社。
Welmers, W. E. (1973). African Language Structures. Berkeley: University of California Press.
BJ 温克 (1987)。及时:关于音乐中的语音节奏。语言学, 25:969-981。Wenk, BJ, & Wioland, F. (1982)。法语真的是按音节计时的吗?语音学杂志,10:193–216。
Wenk, B. J. (1987). Just in time: On speech rhythms in music. Linguistics, 25:969–981. Wenk, B. J., & Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10:193–216.
A. 温纳斯特伦 (2001)。日常演讲的音乐:韵律和话语分析。英国牛津:牛津大学出版社。
Wennerstrom, A. (2001). The Music of Everyday Speech: Prosody and Discourse Analysis. Oxford, UK: Oxford University Press.
Werker, JF, & Curtin, S. (2005)。PRIMIR:婴儿语音处理的发展框架。语言、学习和发展, 1:197–234。
Werker, J. F., & Curtin, S. (2005). PRIMIR: A developmental framework of infant speech processing. Language, Learning and Development, 1:197–234.
Werker, JF、Gilbert, JVH、Humphrey, K. 和 Tees, RC (1981)。跨语言语音感知的发展方面。儿童发展, 52:349–355。
Werker, J. F., Gilbert, J. V. H., Humphrey, K., & Tees, R. C. (1981). Developmental aspects of cross-language speech perception. Child Development, 52:349–355.
Werker, JF, & Tees, RC (1984)。跨语言语音感知:生命第一年感知重组的证据。婴儿行为与发展, 7:49–63。
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7:49–63.
Werker, JF, & Tees, RC (1999)。对婴儿语音处理的影响:走向新的综合。心理学年度回顾, 50:509–535。
Werker, J. F., & Tees, R. C. (1999). Influences on infant speech processing: Toward a new synthesis. Annual Review of Psychology, 50:509–535.
Wetzel, W.、Wagner, T.、Ohl, FW 和 Scheich, H. (1998)。蒙古沙鼠的右听觉皮层损伤会损害对上升和下降调频音调的辨别力。神经科学快报, 252:115-118。
Wetzel, W., Wagner, T., Ohl, F. W., & Scheich, H. (1998). Right auditory cortex lesion in Mongolian gerbils impairs discrimination of rising and falling frequency-modulated tones. Neuroscience Letters, 252:115–118.
Whalen, DH, & Levitt, AG (1995)。元音固有F0的普适性。语音学杂志, 23:349-366。
Whalen, D. H., & Levitt, A. G. (1995). The universality of intrinsic F0 of vowels. Journal of Phonetics, 23:349–366.
Whalen, DH、Levitt, AG 和 Wang, Q. (1991)。法语和英语学习婴儿重复牙牙学语的语调差异。儿童语言杂志, 18:501–516。
Whalen, D. H., Levitt, A. G., & Wang, Q. (1991). Intonational differences between the reduplicative babbling of French- and English-learning infants. Journal of Child Language, 18:501–516.
Whalen, DH, & Xu, Y. (1992)。振幅轮廓和简短片段中普通话声调的信息。语音学, 49:25-47。
Whalen, D. H., & Xu, Y. (1992). Information for Mandarin tones in the amplitude contour and in brief segments. Phonetica, 49:25–47.
捕鲸,CS (2000)。一首歌的背后是什么?鸟类鸣叫学习的神经基础。载于:NL Wallin、B. Merker 和 S. Brown(编),音乐的起源(第 65-76 页)。马萨诸塞州剑桥市:麻省理工学院出版社。
Whaling, C. S. (2000). What’s behind a song? The neural basis of song learning in birds. In: N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 65–76). Cambridge, MA: MIT Press.
Whaling, CS, Solis, MM, Doupe, AJ, Soha, JA, & Marler, P. (1997)。歌曲天生识别的声学和神经基础。美国国家科学院院刊, 94:12694–12698。
Whaling, C. S., Solis, M. M., Doupe, A. J., Soha, J. A., & Marler, P. (1997). Acoustic and neural bases for innate recognition of song. Proceedings of the National Academy of Sciences, USA, 94:12694–12698.
White, LS, & Mattys, SL (2007)。第一语言和第二语言的节奏类型学和变异。在:P. Prieto, J. Mascaro, & M.-J. Sole(编辑),浪漫音韵学中的音段和韵律问题(第 237-257 页)。语言理论系列中的当前问题。阿姆斯特丹:约翰·本杰明斯。
White, L. S., & Mattys, S. L. (2007). Rhythmic typology and variation in first and second languages. In: P. Prieto, J. Mascaro, & M.-J. Sole (Eds)., Segmental and Prosodic Issues in Romance Phonology (pp. 237–257). Current Issues in Linguistic Theory Series. Amsterdam: John Benjamins.
Wightman, CW、Shattuck-Hufnagel, S.、Ostendorf, M. 和 Price, PJ (1992)。韵律边界附近的片段持续时间。美国声学学会杂志, 91:1707–1717。
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P. J. (1992). Segmental durations in the vicinity of prosodic boundaries. Journal of the Acoustical Society of America, 91:1707–1717.
Will, U., & Ellis, C. (1994)。澳大利亚西部沙漠声乐中线性移调的证据。澳大利亚音乐学, 17:2–12。
Will, U., & Ellis, C. (1994). Evidence for linear transposition in Australian Western Desert vocal music. Musicology Australia, 17:2–12.
Will, U., & Ellis, C. (1996)。重新分析的澳大利亚西部沙漠歌曲:频率表现和音程结构。民族音乐学, 40:187–222。
Will, U., & Ellis, C. (1996). A re-analyzed Australian Western desert song: Frequency performance and interval structure. Ethnomusicology, 40:187–222.
N. 威廉姆斯 (1982)。从荷兰语的角度看英语语调。荷兰多德雷赫特:Foris。
Willems, N. (1982). English Intonation From a Dutch Point of View. Dordrecht, The Netherlands: Foris.
Williams, B., & Hiller, SM (1994)。英语脚步计时的随机性问题:对照实验。语音学杂志, 22:423-439。
Williams, B., & Hiller, S. M. (1994). The question of randomness in English foot timing: A control experiment. Journal of Phonetics, 22:423–439.
Willmes, K., & Poeck, K. (1993)。失语症可以定位到什么程度?大脑, 116:1527–1540。
Willmes, K., & Poeck, K. (1993). To what extent can aphasic syndromes be localized? Brain, 116:1527–1540.
威尔逊,EO(1998 年)。一致性:知识的统一。纽约:克诺夫。
Wilson, E. O. (1998). Consilience: The Unity of Knowledge. New York: Knopf.
RJ 威尔逊 (1985)。图论导论(第 3 版)。英国哈洛:朗文出版社。
Wilson, R. J. (1985). Introduction to Graph Theory (3rd ed.). Harlow, UK: Longman.
Wilson, SJ, Pressing, JL, & Wales, RJ (2002)。模拟音乐家中风后的节奏功能。神经心理学, 40:1494–1505。
Wilson, S. J., Pressing, J. L., &Wales, R. J. (2002). Modelling rhythmic function in a musician post-stroke. Neuropsychologia, 40:1494–1505.
温莎,WL (2000)。通过和围绕声学:电声的解释。载于:S. Emmerson(主编),音乐、电子媒体和文化(第 7-35 页)。英国奥尔德肖特:阿什盖特出版社。
Windsor, W. L. (2000). Through and around the acousmatic: The interpretation of electroacoustic sounds. In: S. Emmerson (Ed.), Music, Electronic Media and Culture (pp. 7–35). Aldershot, UK: Ashgate Press.
温菲尔德,P.(主编)。(1999)。雅纳切克研究。英国剑桥:剑桥大学出版社。Winner, E. (1998)。人才:不要混淆必要性和充分性,或科学与政策。行为与脑科学, 21:430–431。
Wingfield, P. (Ed.). (1999). Janáček Studies. Cambridge, UK: Cambridge University Press. Winner, E. (1998). Talent: Don’t confuse necessity with sufficiency, or science with policy. Behavioral and Brain Sciences, 21:430–431.
Wolf, F., & Gibson, E. (2005)。表示话语连贯性:基于语料库的研究。计算语言学, 31:249-287。
Wolf, F., & Gibson, E. (2005). Representing discourse coherence: A corpus-based study. Computational Linguistics, 31:249–287.
Wong, PCM, & Diehl, RL (2002)。音调语言的歌曲歌词如何理解?音乐心理学, 30:202–209。
Wong, P. C. M., & Diehl, R. L. (2002). How can the lyrics of a song in a tone language be understood? Psychology of Music, 30:202–209.
Wong, PCM, & Diehl, RL (2003)。广东话水平音调中谈话者之间和谈话者内部变化的感知标准化。言语、语言和听力研究杂志, 46:413–421。
Wong, P. C. M., & Diehl, R. L. (2003). Perceptual normalization of inter- and intra- talker variation in Cantonese level tones. Journal of Speech, Language, and Hearing Research, 46:413–421.
Wong, PCM, Parsons, LM, Martinez, M., & Diehl, RL (2004)。脑岛皮层在音高模式感知中的作用:语言环境的影响。神经科学杂志, 24:9153–60。
Wong, P. C. M., Parsons, L. M., Martinez, M., & Diehl, R. L. (2004). The role of the insula cortex in pitch pattern perception: The effect of linguistic contexts. Journal of Neuroscience, 24:9153–60.
Wong, PCM, Skoe, E., Russon, NM, Dees, T., & Kraus。N. (2007)。音乐体验塑造了语言音高模式的人类脑干编码。自然神经科学, 10:420–422。
Wong, P. C. M., Skoe, E., Russon, N. M., Dees, T., & Kraus. N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuro-science, 10:420–422.
伍德罗,哈 (1909)。节奏的定量研究:强度、速率和持续时间变化的影响。心理学档案, 14:1-66。
Woodrow, H. A. (1909). A quantitative study of rhythm: The effect of variations in intensity, rate and duration. Archives of Psychology, 14:1–66.
Wright, AA、Rivera, JJ、Hulse, SH、Shyan, M. 和 Neiworth, JJ (2000)。恒河猴的音乐感知和八度泛化。实验心理学杂志:综合, 129:291-307。
Wright, A. A., Rivera, J. J., Hulse, S. H., Shyan, M., & Neiworth, J. J. (2000). Music perception and octave generalization in rhesus monkeys. Journal of Experimental Psychology: General, 129:291–307.
Xu, Y. (1994)。连音的产生和感知。美国声学学会杂志, 95:2240-2253。
Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the Acoustical Society of America, 95: 2240–2253.
Xu, Y. (1999)。音调和焦点对 F0 轮廓的形成和对齐的影响。语音学杂志, 27:55–105。
Xu, Y. (1999). Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics, 27:55–105.
Xu, Y. (2006)。连接话语中的语气。K. Brown(主编),语言和语言学百科全书(第 2 版,第 12 卷,第 742-750 页)。英国牛津:爱思唯尔。
Xu, Y. (2006). Tone in connected discourse. In K. Brown (Ed.), Encyclopedia of Language and Linguistics (2nd ed., Vol. 12, pp. 742–750). Oxford, UK: Elsevier.
Xu, Y., & Sun, X. (2002)。音高变化的最大速度及其与语音的关系。美国声学学会杂志, 111:1399–1413。
Xu, Y., & Sun, X. (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America, 111:1399–1413.
Xu, Y., & Xu, CX (2005)。英语陈述语调中焦点的语音实现。语音学杂志, 33:159-197。
Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English declarative intonation. Journal of Phonetics, 33:159–197.
Yamomoto, F. (1996)。结合英国传统音乐和舞蹈研究英语语音节奏。姬路独协大学学报 Gaikokugogakubo, 9:224-243。
Yamomoto, F. (1996). English speech rhythm studied in connection with British traditional music and dance. Journal of Himeji Dokkyo University Gaikokugogakubo, 9:224–243.
Youens, S. (1991)。追溯冬天的旅程:舒伯特的冬之旅。纽约伊萨卡:康奈尔大学出版社。
Youens, S. (1991). Retracing a Winter’s Journey: Schubert’s Winterreise. Ithaca, New York: Cornell University Press.
Yung, B. (1991)。中国歌剧的文调关系。载于:J. Sundberg、L. Nord 和 R. Carlson(编),音乐、语言、言语和大脑(第 408-418 页)。伦敦:麦克米伦。
Yung, B. (1991). The relationship of text and tune in Chinese opea. In: J. Sundberg, L. Nord & R. Carlson (Eds.), Music, Language, Speech and Brain (pp. 408–418). London: MacMillan.
Zanto, T. P, Snyder, JS, & Large, EW (2006)。节奏预期的神经相关性。认知心理学进展, 2:221–231。
Zanto, T. P, Snyder, J. S., & Large, E. W. (2006). Neural correlates of rhythmic expectancy. Advances in Cognitive Psychology, 2:221–231.
RJ 扎托雷 (2003)。绝对音高:了解基因和发育对神经和认知功能影响的模型。自然神经科学, 6:692–695。
Zatorre, R. J. (2003). Absolute pitch: A model for understanding the influence of genes and development on neural and cognitive function. Nature Neuroscience, 6:692–695.
Zatorre, RJ、Belin, P. 和 Penhune, VB (2002)。听觉皮层的结构和功能:音乐和言语。认知科学趋势, 6:37–46。
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences, 6:37–46.
Zattore, RJ, Evans, AC, & Meyer, E. (1994)。旋律感知和音高记忆的神经机制。神经科学杂志, 14:1908-1919。
Zattore, R. J., Evans, A. C., & Meyer, E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience, 14:1908–1919.
Zatorre, RJ, & Halpern, AR (1979)。同时音程的识别、辨别和选择性适应。知觉与心理物理学, 26:384–395。
Zatorre, R. J., & Halpern, A. R. (1979). Identification, discrimination, and selective adaptation of simultaneous musical intervals. Perception and Psychophysics, 26:384–395.
Zatorre, RJ、Meyer, E.、Gjedde, A. 和 Evans, AC (1996)。语音语音处理的 PET 研究:回顾、复制和再分析。大脑皮层, 6:21–30。
Zatorre, R. J., Meyer, E., Gjedde, A., & Evans, A. C. (1996). PET studies of phonetic processing of speech: Review, replication, and reanalysis. Cerebral Cortex, 6:21–30.
Zbikowski, LM (1999)。“Trockne Blumen”的花朵:十九世纪初的音乐和文字。音乐分析, 18:307–345。
Zbikowski, L. M. (1999). The blossoms of “Trockne Blumen”: Music and text in the early nineteenth century. Music Analysis, 18:307–345.
Zbikowski, LM (2002)。概念化音乐:认知结构、理论和分析。纽约:牛津大学出版社。
Zbikowski, L. M. (2002). Conceptualizing Music: Cognitive Structure, Theory, and Analysis. New York: Oxford University Press.
Zeigler, HP, & Marler, P. (2004)。(编辑)。鸟鸣的行为神经生物学。纽约科学院年鉴, 1016。
Zeigler, H. P., & Marler, P. (2004). (Eds.). Behavioral Neurobiology of Bird Song. Annals of the New York Academy of Sciences, 1016.
Zellner Keller, B. (2002)。重温言语节奏的现状。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Zellner Keller, B. (2002). Revisiting the status of speech rhythm. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Zemp, H. (1981)。美拉尼西亚独奏和弦排箫音乐。民族音乐学, 25:383–418。
Zemp, H. (1981). Melanesian solo polyphonic panpipe music. Ethnomusicology, 25:383–418.
Zetterholm, E. (2002)。模仿语音中的语调模式和持续时间差异。载于:B. Bell & I. Marlien(编辑),Proceedings of Speech Prosody,艾克斯普罗旺斯。法国普罗旺斯地区艾克斯:Parole et Langage。
Zetterholm, E. (2002). Intonation pattern and duration differences in imitated speech. In: B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody, Aix-en-Provence. Aix-en-Provence, France: Laboratoire Parole et Langage.
Zipf, GK (1949)。人类行为和最省力原则。马萨诸塞州剑桥市:Addison-Wesley。
Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.
Zohar, E., & Granot, R. (2006)。音乐如何运动:音乐参数和听众对运动的印象。音乐感知, 23:221-248。
Zohar, E., & Granot, R. (2006). How music moves: Musical parameters and listeners images of motion. Music Perception, 23:221–248.
本书所列页面中引用的以下声音和声音/视频示例可在 www.oup.com/us/patel上找到
The following sound and sound/video examples, referenced in this book on the pages listed, can be found at www.oup.com/us/patel
声音示例 2.1:四个短音序列,对应于图 2.3,面板 AD(第23页)
Sound Example 2.1: Four short tone sequences, corresponding to Figure 2.3, panels A-D (p. 23)
声音示例 2.2:一个简短的钢琴段落,然后是相同的段落,但声音信号时间倒转(第29页)
Sound Example 2.2: A short piano passage, followed by the same passage with the acoustic signal time-reversed (p. 29)
声音示例 2.3:八种塔布拉鼓声音,对应于表 2.1 (相应的词汇参见声音示例2.7)(第35、36、63页)
Sound Example 2.3: Eight tabla drum sounds, corresponding to Table 2.1 (cf. Sound Example 2.7 for corresponding vocables) (pp. 35, 36, 63)
声音/视频示例 2.4:专业塔布拉演奏者发出一系列塔布拉鼓声,然后演奏相应的鼓序列的电影(第37页)
Sound/Video Example 2.4: Movie of a professional tabla player uttering a sequence of tabla vocables, then playing the corresponding drum sequence (p. 37)
声音示例 2.5:Jukun 中的一段短文,一种来自尼日利亚的离散声调语言(第46页)
Sound Example 2.5: A short passage in Jukun, a discrete level tone language from Nigeria (p. 46)
声音示例 2.6:英式英语的一个句子,对应于图 2.21a 、2.21b和2.21c(第61、62、63页)
Sound Example 2.6: A sentence of British English, corresponding to Figures 2.21a, 2.21b, and 2.21c (pp. 61, 62, 63)
声音示例 2.7:八个塔布拉鼓声的示例(相应鼓声参见声音示例 2.3)(第63页)
Sound Example 2.7: Examples of eight tabla drum vocables (cf. Sound Example 2.3 for the corresponding drum sounds) (p. 63)
声音示例 2.8:正弦波语音的一个句子及其创建的原始句子(第76页)
Sound Example 2.8: A sentence of sine wave speech and the original sentence from which it was created (p. 76)
声音范例3.1:一首简单的西欧民谣(K0016),对应图3.1 (p. 99 )
Sound Example 3.1: A simple Western European folk melody (K0016), corresponding to Figure 3.1 (p. 99)
声音示例 3.2:K0016 播放了两次,有两种不同的节拍指示(第100页)
Sound Example 3.2: K0016 played twice, with two different indications of the beat (p. 100)
声音示例 3.3:强烈的韵律节奏模式(第101页)
Sound Example 3.3: A strongly metrical rhythmic pattern (p. 101)
声音示例 3.4:弱韵律(切分音)节奏模式,频繁的“无声节拍”不包含任何事件(第101页)
Sound Example 3.4: A weakly metrical (syncopated) rhythmic pattern, with frequent “silent beats” containing no events (p. 101)
声音示例 3.5:英式英语句子和大陆法语句子,对应于图 3.8a 和 3.8b (pp. 130 , 131)
Sound Example 3.5: A sentence of British English and a sentence of continental French, corresponding to Figures 3.8a and 3.8b (pp. 130, 131)
声音示例 3.6:一个英语句子从其原始版本转变为越来越抽象的元音和辅音时间模式(第136页)
Sound Example 3.6: An English sentence transformed from its original version into an increasingly abstract temporal pattern of vowels and consonants (p. 136)
声音示例 3.7:日语句子从其原始版本转变为越来越抽象的元音和辅音时间模式(第136页)
Sound Example 3.7: A Japanese sentence transformed from its original version into an increasingly abstract temporal pattern of vowels and consonants (p. 136)
声音示例 3.8:英式英语的句子和大陆法语的句子,对应于图 3.15 (pp. 162 , 163 )
Sound Example 3.8: A sentence of British English and a sentence of continental French, corresponding to Figure 3.15 (pp. 162, 163)
声音示例 3.9:来自玻利维亚安第斯山脉的复活节歌曲,用小吉他 (charango) 演奏(第169页)
Sound Example 3.9: Easter song from the Bolivian Andes, played on a small guitar (charango) (p. 169)
声音示例 3.10:音调在振幅或持续时间上交替变化的音调序列示例(第170页)
Sound Example 3.10: Examples of tone sequences in which tones alternate in amplitude or duration (p. 170)
声音示例 4.1:英式英语的一个句子,对应于图4.2、4.3和4.4(第186、187、188、189页)
Sound Example 4.1: A sentence of British English, corresponding to Figures 4.2, 4.3, and 4.4 (pp. 186, 187, 188, 189)
声音示例 4.2:英式英语的一个句子,对应于图 4.5(第191页)
Sound Example 4.2: A sentence of British English, corresponding to Figure 4.5 (p. 191)
声音示例 4.3:大陆法语的一个句子,对应于图 4.6(第192、193页)
Sound Example 4.3: A sentence of continental French, corresponding to Figure 4.6 (pp. 192, 193)
声音示例 4.4:英语合成句子,带有英语或法语语调(第192页)
Sound Example 4.4: A synthesized sentence of English, with either English or French intonation (p. 192)
声音示例 4.5:两个短旋律说明音调关系对音调稳定性感知的影响(第201页)
Sound Example 4.5: Two short melodies illustrating the influence of tonality relations on the perception of a tone’s stability (p. 201)
声音示例 4.6:带有酸音的 K0016(第201页)
Sound Example 4.6: K0016 with a sour note (p. 201)
声音示例 4.7:带和弦伴奏的 K0016(第202页)
Sound Example 4.7: K0016 with a chordal accompaniment (p. 202)
声音示例 4.8:大陆法语的一个句子,对应于图 4.9(第203、204页)
Sound Example 4.8: A sentence of continental French, corresponding to Figure 4.9 (pp. 203, 204)
声音示例 4.9:一对句子和一对相应的音调类似物(第227 页)
Sound Example 4.9: A pair of sentences and a corresponding pair of tone analogs (p. 227)
声音示例 4.10:一对句子、一对对应的离散音调模拟和一对对应的滑音模拟(第231页)
Sound Example 4.10: A pair of sentences, a corresponding pair of discrete-tone analogs, and a corresponding pair of gliding-pitch analogs (p. 231)
声音示例 5.1:来自 JS Bach 的乐句,对应于图 5.10(第257页)
Sound Example 5.1: A musical phrase from J. S. Bach, corresponding to Figure 5.10 (p. 257)
声音示例 5.2:两个和弦进行,其中最后两个和弦在物理上相同但具有不同的和声功能(第260页)
Sound Example 5.2: Two chord progressions in which the final two chords are physically identical but have different harmonic functions (p. 260)
声音示例 5.3:音乐和弦序列,对应于图 5.12(第274页)
Sound Example 5.3: Musical chord sequences, corresponding to Figure 5.12 (p. 274)
声音示例 7.1:莫扎特小步舞曲的原始版本和不和谐版本(第380页)
Sound Example 7.1: A Mozart minuet in its original and dissonant versions (p. 380)
声音示例 7.2:短节奏序列,对应于图 7.3(第407页)
Sound Example 7.2: Short rhythmic sequences, corresponding to Figure 7.3 (p. 407)
声音/视频示例 7.3:泰国象乐团中一头雌性亚洲象 ( Elephas maximus ) 的电影,在两个泰国寺庙鼓上击鼓(第408页)
Sound/Video Example 7.3: Movie of a female Asian elephant (Elephas maximus) in the Thai Elephant Orchestra, drumming on two Thai temple drums (p. 408)
声音/视频示例 7.4:红面鹦鹉或“樱桃头鹦鹉”(Aratinga erythrogenys)随着音乐移动的电影(第411页)
Sound/Video Example 7.4: Movie of a red-masked parakeet or “cherry-headed conure” (Aratinga erythrogenys) moving to music (p. 411)
图 2.1:摘自 Roger Shepard,“音高的结构表示”,摘自 Diana Deutsch(主编) ,音乐心理学(第 343-390 页)。© 1982 学术出版社。经许可使用。
Figure 2.1: From Roger Shepard, “Structural representations of musical pitch,” in Diana Deutsch (Ed.), The Psychology of Music (pp. 343-390). © 1982 by Academic Press. Used by permission.
图 2.2b:来自音乐世界:世界人民音乐概论,简版,第 1 版。作者:Titon,2001 年。经 Thomson Learning 的一个部门 Wadsworth 许可转载:www.thomsonrights.com。传真 800-730-2215。
Figure 2.2b: From Worlds of Music: An Introduction to Music of the World’s Peoples, Shorter Ed., 1st ed. by Titon, 2001. Reprinted with permission of Wadsworth, a division of Thomson Learning: www.thomsonrights.com. Fax 800-730-2215.
图 2.3:来自 WJ Dowling、S. Kwak 和 MW Andrews,“识别新颖旋律的时间过程”,感知与心理物理学, 57:136-149。© 1995 心理经济学会版权所有。经许可使用。
Figure 2.3: From W. J. Dowling, S. Kwak, & M. W. Andrews, “The time course of recognition of novel melodies,” Perception and Psychophysics, 57:136-149. © 1995 by the Psychonomic Society. Used by permission.
图 2.8:出自 Rodolfo R. Llinás 和 Patricia Smith Churchland,心脑连续体:感觉过程,图 12.9,第 267 页。© 1996 麻省理工学院版权所有。经麻省理工学院出版社许可使用。
Figure 2.8: From Rodolfo R. Llinás & Patricia Smith Churchland, Mind-Brain Continuum: Sensory Processes, figure 12.9, page 267. © 1996 by the Massachusetts Institute of Technology. Used by permission of MIT Press.
图 2.11:来自 B. Connell,“Mambila 中词汇声调的感知”,Language and Speech, 43:163-182。© 2000 金士顿出版社。经许可使用。
Figure 2.11: From B. Connell, “The perception of lexical tone in Mambila,” Language and Speech, 43:163-182. © 2000 by Kingston Press. Used by permission.
图 2.12:来自 JA Edmondson 和 KJ Gregerson,“关于五级声调系统”,SJ Hwang 和 WR Merrifield(编),语境中的语言:Robert E. Longacre 的论文(第 555-576 页)。德克萨斯州达拉斯:夏季语言学学院。© 1992 SIL 国际版权所有。经许可使用。
Figure 2.12: From J. A. Edmondson & K. J. Gregerson, “On five-level tone systems,” in S. J. Hwang & W. R. Merrifield (Eds.), Language in Context: Essays for Robert E. Longacre (pp. 555-576). Dallas, TX: Summer Institute of Linguistics. © 1992 by SIL International. Used by permission.
图 2.13:来自 K. Wedekind,“绘制声调语言地图时的想法”,Afrikanistische Arbeitspapiere, 1:105-124。© 1985 克劳斯·韦德金德。经许可使用。
Figure 2.13: From K. Wedekind, “Thoughts when drawing a map of tone languages,” Afrikanistische Arbeitspapiere, 1:105-124. © 1985 by Klaus Wedekind. Used by permission.
图 2.16:来自 Peter Ladefoged 和 Ian Maddieson,世界语言之声。© 1996 Blackwell Publishing 版权所有。经许可使用。
Figure 2.16: From Peter Ladefoged & Ian Maddieson, The Sounds of the World’s Languages. © 1996 by Blackwell Publishing. Used by permission.
图 2.18:经 GE Peterson 和 HL Barney 许可转载,“元音研究中使用的控制方法” ,美国声学学会杂志, 24:175-184。© 1952 美国物理研究所版权所有。经许可使用。
Figure 2.18: Reprinted with permission from G. E. Peterson & H. L. Barney, “Control methods used in the study of vowels,” Journal of the Acoustical Society of America, 24:175-184. © 1952 by the American Institute of Physics. Used by permission.
图 2.19:出自 Peter Ladefoged,语音学课程(附 CD-ROM),第 5 版,2006 年。经 Thomson Learning 旗下的 Heinle 许可转载:www.thomsonrights.com。传真 800-730-2215。
Figure 2.19: From Peter Ladefoged, A Course in Phonetics (with CD-ROM), 5th ed., 2006. Reprinted with permission of Heinle, a division of Thomson Learning:www.thomsonrights.com. Fax 800-730-2215.
图 2.23a、b:来自 P. Iverson、P. Kuhl、R. Akahane-Yamada、E. Diesch、Y. Tohkura、A. Kettermann 和 C. Siebert,“非母语习得困难的感知干扰说明音素,”认知, 87:B47-B57。© 2003 爱思唯尔版权所有。经许可使用。
Figure 2.23a, b: From P. Iverson, P. Kuhl, R. Akahane-Yamada, E. Diesch, Y. Tohkura, A. Kettermann, & C. Siebert, “A perceptual interference account of acquisition difficulties for non-native phonemes,” Cognition, 87:B47-B57. © 2003 by Elsevier. Used by permission.
图 2.24:来自 M. Tervaniemi、AJ Szameitat、S. Kruck、E. Schröger、K. Alter、W. De Baene 和 A. Friederici,“从空气振荡到音乐和语音:微调神经网络的 fMRI 证据在试镜中,”神经科学杂志, 26:8647-8652。© 2006 年神经科学协会版权所有。经许可使用。
Figure 2.24: From M. Tervaniemi, A. J. Szameitat, S. Kruck, E. Schröger, K. Alter, W. De Baene, & A. Friederici, “From air oscillations to music and speech: fMRI evidence for fine-tuned neural networks in audition,” Journal of Neuroscience, 26:8647-8652. © 2006 by the Society for Neuroscience. Used by permission.
图 2.25:来自 J. Maye 和 DJ Weiss,“统计线索有助于婴儿辨别困难的语音对比”,B. Beachley 等人。(编辑),BUCLD 27 会议记录(第 508-518 页)。马萨诸塞州萨默维尔:Cascadilla Press。© 2003 J. Maye & DJ Weiss。经许可使用。
Figure 2.25: From J. Maye & D. J. Weiss, “Statistical cues facilitate infants’ discrimination of difficult phonetic contrasts,” in B. Beachley et al. (Eds.), BUCLD 27 Proceedings (pp. 508-518). Somerville, MA: Cascadilla Press. © 2003 by J. Maye & D. J. Weiss. Used by permission.
图 3.1:摘自 AD Patel,“旋律认知神经科学的新方法”,载于 I. Peretz & R. Zatorre(编),《音乐的认知神经科学》。© 2003 牛津大学出版社。经许可使用。
Figure 3.1: From A. D. Patel, “A new approach to the cognitive neuroscience of melody,” in I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music. © 2003 by Oxford University Press. Used by permission.
图 3.2:摘自 AD Patel,“旋律认知神经科学的新方法”,I. Peretz & R. Zatorre(编),音乐认知神经科学。© 2003 牛津大学出版社。经许可使用。
Figure 3.2: From A. D. Patel, “A new approach to the cognitive neuroscience of melody,” in I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music. © 2003 by Oxford University Press. Used by permission.
图 3.5:摘自 Bruce Hayes,“The prosodic hierarchy in meter”,收自 Paul Kiparsky 和 Gilbert Youmans(编),Phonetics and Phonology(第 201-260 页)。© 1989 学术出版社版权所有。经许可使用。
Figure 3.5: From Bruce Hayes, “The prosodic hierarchy in meter,” in Paul Kiparsky & Gilbert Youmans (Eds.), Phonetics and Phonology (pp. 201-260). © 1989 by Academic Press. Used by permission.
图 3.7:来自 F. Ramus、M. Nespor 和 J. Mehler,“语音信号中语言节奏的关联”,Cognition, 73:265-292。© 1999 爱思唯尔版权所有。经许可使用。
Figure 3.7: From F. Ramus, M. Nespor, & J. Mehler, “Correlates of linguistic rhythm in the speech signal,” Cognition, 73:265-292. © 1999 by Elsevier. Used by permission.
图 3.10:摘自 B. Bell 和 I. Marlien(编)的“语言节奏的声学相关性:观点”,Proceedings of Speech Prosody 2002,Aix-en-Provence。© 2002 Franck Ramus。经许可使用。
Figure 3.10: From “Acoustic correlates of linguistic rhythm: Perspectives,” in B. Bell & I. Marlien (Eds.), Proceedings of Speech Prosody 2002, Aix-en-Provence. © 2002 by Franck Ramus. Used by permission.
图 3.11:来自 Elisabeth O. Selkirk,音韵学和句法:声音和结构之间的关系,图 2.2,p。46. © 1984 麻省理工学院版权所有。经麻省理工学院出版社许可使用。
Figure 3.11: From Elisabeth O. Selkirk, Phonology and Syntax: The Relation Between Sound and Structure, figure 2.2, p. 46. © 1984 by the Massachusetts Institute of Technology. Used by permission of MIT Press.
图 3.13:来自 D. Temperley,“摇滚中的切分音:感知视角”,流行音乐, 18:19-40。© 1999 剑桥大学出版社。经许可使用。
Figure 3.13: From D. Temperley, “Syncopation in rock: A perceptual perspective,” Popular Music, 18:19-40. © 1999 by Cambridge University Press. Used by permission.
图 3.16:来自 AD Patel 和 JR Daniele,“语言和音乐节奏的实证比较”,Cognition, 87:B35-B45。© 2003 爱思唯尔版权所有。经许可使用。
Figure 3.16: From A. D. Patel & J. R. Daniele, “An empirical comparison of rhythm in language and music,” Cognition, 87:B35-B45. © 2003 by Elsevier. Used by permission.
图 3.18:来自 AD Patel 和 JR Daniele,“Stress-timed vs. syllable-timed 音乐?对 Huron 和 Ollen (2003) 的评论,” Music Perception, 21:273-276。© 2003 年加州大学出版社版权所有。经许可使用。
Figure 3.18: From A. D. Patel & J. R. Daniele, “Stress-timed vs. syllable-timed music? A comment on Huron and Ollen (2003),” Music Perception, 21:273-276. © 2003 by the University of California Press. Used by permission.
图 4.1:来自 J. Terken,“重音音节的基频和感知突出” ,美国声学学会杂志, 89:1768-1776。© 1991 美国物理研究所版权所有。经许可使用。
Figure 4.1: From J. Terken, “Fundamental frequency and perceived prominence of accented syllables,” Journal of the Acoustical Society of America, 89:1768-1776. © 1991 by the American Institute of Physics. Used by permission.
图 4.7:摘自 AD Patel,“旋律认知神经科学的新方法”,载于 I. Peretz & R. Zatorre(编),《音乐的认知神经科学》。© 2003 牛津大学出版社。经许可使用。
Figure 4.7: From A. D. Patel, “A new approach to the cognitive neuroscience of melody,” in I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music. © 2003 by Oxford University Press. Used by permission.
图 4.8:来自 CL Krumhansl 和 EJ Kessler,“在音乐键的空间表征中追踪感知音调组织的动态变化”,Psychological Review, 89:334-368。© 1982 美国心理学会版权所有。经许可使用。
Figure 4.8: From C. L. Krumhansl & E. J. Kessler, “Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys,” Psychological Review, 89:334-368. © 1982 by the American Psychological Association. Used by permission.
图 4.10:来自 J. 't Hart、R. Collier 和 A. Cohen,语调的感知研究:语音旋律的实验语音方法。英国剑桥:剑桥大学出版社。© 1990 剑桥大学出版社。经许可使用。
Figure 4.10: From J. ’t Hart, R. Collier, & A. Cohen, A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge, UK: Cambridge University Press. © 1990 by Cambridge University Press. Used by permission.
图 4.11:来自 J. 't Hart、R. Collier 和 A. Cohen,语调的感知研究:语音旋律的实验语音方法。英国剑桥:剑桥大学出版社。© 1990 剑桥大学出版社。经许可使用。
Figure 4.11: From J. ’t Hart, R. Collier, & A. Cohen, A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge, UK: Cambridge University Press. © 1990 by Cambridge University Press. Used by permission.
图 4.13:来自 PG Vos 和 JM Troost,“上升和下降的旋律音程:统计结果及其感知相关性”,Music Perception, 6:383-396。© 1989 年加州大学出版社版权所有。经许可使用。
Figure 4.13: From P. G. Vos & J. M. Troost, “Ascending and descending melodic intervals: Statistical findings and their perceptual relevance,” Music Perception, 6:383-396. © 1989 by the University of California Press. Used by permission.
图 4.15:来自 AD Patel、JR Iversen 和 J. Rosenberg,“比较语音和音乐的节奏和旋律:英国英语和法语的案例” ,美国声学学会杂志, 119:3034-3047。© 2006 美国物理研究所版权所有。经许可使用。
Figure 4.15: From A. D. Patel, J. R. Iversen, & J. Rosenberg, “Comparing the rhythm and melody of speech and music: The case of British English and French,” Journal of the Acoustical Society of America, 119:3034-3047. © 2006 by the American Institute of Physics. Used by permission.
图 4.17:来自 AD Patel、I. Peretz、M. Tramo 和 R. Labrecque,“处理韵律和音乐模式:神经心理学调查”,《大脑与语言》, 61:123-144。© 1998 爱思唯尔版权所有。经许可使用。
Figure 4.17: From A. D. Patel, I. Peretz, M. Tramo, & R. Labrecque, “Processing prosodic and musical patterns: A neuropsychological investigation,” Brain and Language, 61:123-144. © 1998 by Elsevier. Used by permission.
图 4.18:来自 AD Patel、JM Foxton 和 TD Griffiths,“音乐失聪的人难以辨别从语音中提取的语调轮廓”,《大脑与认知》, 59:310-313。© 2005 爱思唯尔版权所有。经许可使用。
Figure 4.18: From A. D. Patel, J. M. Foxton, & T. D. Griffiths, “Musically tone-deaf individuals have difficulty discriminating intonation contours extracted from speech,” Brain and Cognition, 59:310-313. © 2005 by Elsevier. Used by permission.
图 5.1:来自 E. Balaban,“鸟鸣句法:习得的种内变异是有意义的” ,美国国家科学院院刊, 85:3657-3660。© 1988 美国国家科学院版权所有。经许可使用。
Figure 5.1: From E. Balaban, “Bird song syntax: Learned intraspecific variation is meaningful,” Proceedings of the National Academy of Sciences, USA, 85:3657-3660. © 1988 by the National Academy of Sciences, USA. Used by permission.
图 5.2:来自 LL Cuddy、AJ Cohen 和 DJK Mewhort,“短旋律序列中结构的感知” ,实验心理学杂志:人类感知 和性能, 7:869-883。© 1981 美国心理学会版权所有。经许可使用。
Figure 5.2: From L. L. Cuddy, A. J. Cohen, & D. J. K. Mewhort, “Perception of structure in short melodic sequences,” Journal of Experimental Psychology: Human Perception and Performance, 7:869-883. © 1981 by the American Psychological Association. Used by permission.
图 5.3:摘自 CL Krumhansl,“音调背景下音高的心理表征”,认知心理学, 11:346-374。© 1979 爱思唯尔版权所有。经许可使用。
Figure 5.3: From C. L. Krumhansl, “The psychological representation of musical pitch in a tonal context,” Cognitive Psychology, 11:346-374. © 1979 by Elsevier. Used by permission.
图 5.4:来自 LL Cuddy、AJ Cohen 和 DJK Mewhort,“Perception of structure in short melodic sequences” ,实验心理学杂志:人类感知和表现, 7:869-883。© 1981 美国心理学会版权所有。经许可使用。
Figure 5.4: From L. L. Cuddy, A. J. Cohen, & D. J. K. Mewhort, “Perception of structure in short melodic sequences,” Journal of Experimental Psychology: Human Perception and Performance, 7:869-883. © 1981 by the American Psychological Association. Used by permission.
图 5.5:摘自 CL Krumhansl、JJ Bharucha 和 EJ Kessler,“相关音调中和弦的感知和声结构” ,实验心理学杂志:人类感知和表现, 8:24-36。© 1982 美国心理学会版权所有。经许可使用。
Figure 5.5: From C. L. Krumhansl, J. J. Bharucha, & E. J. Kessler, “Perceived harmonic structure of chords in thee related musical keys,” Journal of Experimental Psychology: Human Perception and Performance, 8:24-36. © 1982 by the American Psychological Association. Used by permission.
图 5.7:来自 CL Krumhansl 和 EJ Kessler,“在音乐键的空间表征中追踪感知音调组织的动态变化”,Psychological Review, 89:334-368。© 1982 美国心理学会版权所有。经许可使用。
Figure 5.7: From C. L. Krumhansl & E. J. Kessler, “Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys,” Psychological Review, 89:334-368. © 1982 by the American Psychological Association. Used by permission.
图 5.8:来自 AD Patel,“语言、音乐、语法和大脑”,《自然神经科学》, 6:674-681。© 2003 Macmillan Publishers Ltd.,Nature Neuroscience。经许可使用。
Figure 5.8: From A. D. Patel, “Language, music, syntax, and the brain,” Nature Neuroscience, 6:674-681. © 2003 by Macmillan Publishers Ltd., Nature Neuroscience. Used by permission.
图 5.9:来自 EW Large、C. Palmer 和 JB Pollack,“减少音乐的记忆表示”,认知科学, 19:53-96。© 1995 认知科学协会版权所有。经许可使用。
Figure 5.9: From E. W. Large, C. Palmer, & J. B. Pollack, “Reduced memory representations for music,” Cognitive Science, 19:53-96. © 1995 by the Cognitive Science Society. Used by permission.
图 5.10:修改自 F. Lerdahl,Tonal Pitch Space, 32。© 2001 牛津大学出版社。经许可使用。
Figure 5.10: Modified from F. Lerdahl, Tonal Pitch Space, 32. © 2001 by Oxford University Press. Used by permission.
图 5.11:来自 AD Patel、E. Gibson、J. Ratner、M. Besson 和 P. Holcomb,“处理语言和音乐中的句法关系:一项与事件相关的潜在研究” ,认知神经科学杂志, 10:717- 733. © 1998 麻省理工学院版权所有。经麻省理工学院出版社许可使用。
Figure 5.11: From A. D. Patel, E. Gibson, J. Ratner, M. Besson, & P. Holcomb, “Processing syntactic relations in language and music: An event-related potential study,” Journal of Cognitive Neuroscience, 10:717-733. © 1998 by the Massachusetts Institute of Technology. Used by permission of MIT Press.
图 5.13:来自 AD Patel、E. Gibson、J. Ratner、M. Besson 和 P. Holcomb,“处理语言和音乐中的句法关系:一项与事件相关的潜在研究” ,认知神经科学杂志, 10:717- 733. © 1998 麻省理工学院版权所有。经麻省理工学院出版社许可使用。
Figure 5.13: From A. D. Patel, E. Gibson, J. Ratner, M. Besson, & P. Holcomb, “Processing syntactic relations in language and music: An event-related potential study,” Journal of Cognitive Neuroscience, 10:717-733. © 1998 by the Massachusetts Institute of Technology. Used by permission of MIT Press.
图 5.14:来自 AD Patel,“语言、音乐、句法和大脑”,《自然神经科学》, 6:674-681。© 2003 Macmillan Publishers Ltd.,Nature Neuroscience。经许可使用。
Figure 5.14: From A. D. Patel, “Language, music, syntax, and the brain,” Nature Neuroscience, 6:674-681. © 2003 by Macmillan Publishers Ltd., Nature Neuroscience. Used by permission.
图 5.15:来自 F. Lerdahl,Tonal Pitch Space, 32。© 2001 牛津大学出版社。经许可使用。
Figure 5.15: From F. Lerdahl, Tonal Pitch Space, 32. © 2001 by Oxford University Press. Used by permission.
图 5.17:来自 AD Patel,“音乐与语音旋律和失语症句法处理障碍的关系” ,纽约科学院年鉴, 1060:59-70。© 2005 年布莱克威尔出版社版权所有。经许可使用。
Figure 5.17: From A. D. Patel, “The relationship of music to the melody of speech and to syntactic processing disorders in aphasia,” Annals of the New York Academy of Sciences, 1060:59-70. © 2005 by Blackwell Publishing. Used by permission.
图 5.20:来自 AD Patel,“音乐与语音旋律和失语症句法处理障碍的关系” ,纽约科学院年鉴, 1060:59-70。© 2005 年布莱克威尔出版社版权所有。经许可使用。
Figure 5.20: From A. D. Patel, “The relationship of music to the melody of speech and to syntactic processing disorders in aphasia,” Annals of the New York Academy of Sciences, 1060:59-70. © 2005 by Blackwell Publishing. Used by permission.
图 6.1:来自 A. Gabrielsson 和 E. Lindström,“音乐结构对情绪表达的影响”,PN Juslin 和 JA Sloboda(编),音乐与情感:理论与研究(第 223-248 页)。© 2001 牛津大学出版社。经许可使用。
Figure 6.1: From A. Gabrielsson & E. Lindström, “The influence of musical structure on emotional expression,” in P. N. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory and Research (pp. 223-248). © 2001 by Oxford University Press. Used by permission.
图 6.2:摘自 CL Krumhansl,“音乐主题:对莫扎特 C 大调弦乐五重奏和贝多芬 A 小调弦乐四重奏中可记忆性、开放性和情感的实证研究”,《音乐感知》, 16:119-134。© 1998 年加州大学出版社版权所有。经许可使用。
Figure 6.2: From C. L. Krumhansl, “Topic in music: An empirical study of memorability, openness, and emotion in Mozart’s String Quintet in C Major and Beethoven’s String Quartet in A Minor,” Music Perception, 16:119-134. © 1998 by the University of California Press. Used by permission.
表 6.2:来自 PN Juslin & P. Laukka,“声音表达和音乐表演中的情感交流:不同的渠道,相同的代码?” 心理通报, 129:770-814。© 2003 美国心理学会版权所有。经许可使用。
Table 6.2: From P. N. Juslin & P. Laukka, “Communication of emotions in vocal expression and music performance: Different channels, same code?” Psychological Bulletin, 129:770-814. © 2003 by the American Psychological Association. Used by permission.
图 6.3:来自 R. Hacohen 和 N. Wagner,“瓦格纳主旋律的交际力量:它们的内涵和外延之间的互补关系”,Music Perception, 14:445-476。© 1997 年加州大学出版社版权所有。经许可使用。
Figure 6.3: From R. Hacohen & N. Wagner, “The communicative force of Wagner’s leitmotifs: Complementary relationships between their connotations and denotations,” Music Perception, 14:445-476. © 1997 by the University of California Press. Used by permission.
图 6.4:来自 R. Hacohen 和 N. Wagner,“瓦格纳主旋律的交际力量:它们的内涵和外延之间的互补关系”,Music Perception, 14:445-476。© 1997 年加州大学出版社版权所有。经许可使用。
Figure 6.4: From R. Hacohen & N. Wagner, “The communicative force of Wagner’s leitmotifs: Complementary relationships between their connotations and denotations,” Music Perception, 14:445-476. © 1997 by the University of California Press. Used by permission.
图 6.5:来自 S. Koeslch、E. Kasper、D. Sammler、K. Schulze、T. Gunter 和 AD Friederici,“音乐、语言和意义:语义处理的大脑特征”,Nature Neuroscience, 7:302- 307. © 2004 Macmillan Publishers Ltd.,Nature Neuroscience。经许可使用。
Figure 6.5: From S. Koeslch, E. Kasper, D. Sammler, K. Schulze, T. Gunter, & A. D. Friederici, “Music, language, and meaning: Brain signatures of semantic processing,” Nature Neuroscience, 7:302-307. © 2004 by Macmillan Publishers Ltd., Nature Neuroscience. Used by permission.
图 6.6:来自 F. Wolf 和 E. Gibson,“Representing discourse coherence: A corpusbased study”,计算语言学, 31:249-287。© 2005 年麻省理工学院版权所有。经麻省理工学院出版社许可使用。
Figure 6.6: From F. Wolf & E. Gibson, “Representing discourse coherence: A corpusbased study,” Computational Linguistics, 31:249-287. © 2005 by the Massachusetts Institute of Technology. Used by permission of MIT Press.
图 7.2:来自 RJ Greenspan 和 T. Tully,“小组报告:基因如何设置行为?” 在 RJ Greenspan 和 CP Kyriacou(编辑)中,灵活性和约束 行为系统(第 65-80 页)。© 1994 John Wiley & Sons, Limited 版权所有。经许可使用。
Figure 7.2: From R. J. Greenspan & T. Tully, “Group report: How do genes set up behavior?” in R. J. Greenspan & C. P. Kyriacou (Eds.), Flexibility and Constraint in Behavioral Systems (pp. 65-80). © 1994 by John Wiley & Sons, Limited. Used by permission.
图 7.3:来自 TR Bergeson & SE Trehub,“婴儿对节奏模式的感知”,Music Perception, 23:345-360。© 2006 年加州大学出版社版权所有。经许可使用。
Figure 7.3: From T. R. Bergeson & S. E. Trehub, “Infants’ perception of rhythmic patterns,” Music Perception, 23:345-360. © 2006 by the University of California Press. Used by permission.
Abercrombie, D., 120 – 121 , 152
Abercrombie, D., 120–121, 152
Abraham, G., 159, 161
阿克,B.,81 岁
Acker, B., 81
Adams, S., 154–155
阿道夫斯, R., 348
Adolphs, R., 348
Agawu, K., 154, 321
Ailleo, R., 174
Ailleo, R., 174
阿兰,C.,27 岁
Alain, C., 27
阿尔科克, KJ, 366 , 388 – 389 , 405
Alcock, K. J., 366, 388–389, 405
Allen, G., 230, 391
Altenmüller, E., 175, 401
奥尔特曼,S.,10
Altmann, S., 10
Anvari, S., 78–79, 387
阿卡迪, AC, 409
Arcadi, A. C., 409
阿罗姆,S.,18 岁
Arom, S., 18
Arvaniti, A., 124 , 142 , 157 , 209 , 213 n. 11 , 235
Arvaniti, A., 124, 142, 157, 209, 213n.11, 235
阿什利,R.,114
Ashley, R., 114
阿特勒, M., 209
Atterer, M., 209
Ayari, M., 17, 302
阿约特, J., 175 – 176 , 185 , 201 , 229 – 231 , 268 – 269 , 284 , 357 , 369 , 372 , 391 – 392
Ayotte, J., 175–176, 185, 201, 229–231, 268–269, 284, 357, 369, 372,391–392
贝克,M.,171
Baker, M., 171
巴拉班, E., 87 , 195 , 243 , 378 , 390
Balaban, E., 87, 195, 243, 378, 390
Balkwill, L.-L., 309 , 312 – 314 , 345 , 347
Balkwill, L.-L., 309, 312–314, 345, 347
巴尔扎诺,GJ,20
Balzano, G. J., 20
巴普蒂斯塔,LF,355
Baptista, L. F., 355
Barlow, H., 163–164
Barrett, S., 81–82
Baum, S., 228n.17
贝克尔, J., 241 , 301 , 314 , 316 n. 6 , 324 – 326 , 417
Becker, J., 241, 301, 314, 316n.6, 324–326, 417
缅因州贝克曼,111 n。6 , 121 , 188 , 191 , 207
Beckman, M. E., 111n.6, 121, 188, 191, 207
Beeman, M., 340–341
贝林,P.,350
Belin, P., 350
贝尔维尔, S., 228
Belleville, S., 228
Bellugi, U., 361 n. 6 , 364 , 388 名词。14
Bellugi, U., 361n.6, 364, 388n.14
Benamou, M., 314
Benamou, M., 314
本特森, SL, 374
Bengtsson, S. L., 374
Bergeson, T. R., 381, 406–407
伯恩斯坦, L., 4 , 92 , 240 , 259 , 263 , 298
Bernstein, L., 4, 92, 240, 259, 263, 298
贝松,248、253、260、271 – 275、286、335、349 n。_ _ _ _ _ _ _ _ _ _ 16
Besson, 248, 253, 260, 271–275, 286, 335, 349n.16
贝斯特, C., 69 – 70 , 74 , 85 , 382
Best, C., 69–70, 74, 85, 382
Bharucha, JJ, 201 , 254 – 255 , 260 , 262 , 282 , 294 – 295 , 297
Bharucha, J. J., 201, 254–255, 260, 262, 282, 294–295, 297
比克顿,D.,365
Bickerton, D., 365
Bigand, E., 26 , 101 , 200 , 258 – 260 , 262 , 282 , 287 , 294 – 295 , 297 , 307 , 309 , 341 , 350 , 375 – 376
Bigand, E., 26, 101, 200, 258–260, 262, 282, 287, 294–295, 297, 307, 309, 341, 350, 375–376
Blacking, J., 217, 405n.20
Blood, A., 318, 347
布鲁姆斯坦,SE,297
Blumstein, S. E., 297
博林格 D., 122 , 124 – 126 , 131 , 133 , 135 , 194 , 207 , 238
Bolinger, D., 122, 124–126, 131, 133, 135, 194, 207, 238
博尔顿,T.,169
Bolton, T., 169
玻尔兹, MG, 195 , 201 , 203 , 403
Boltz, M. G., 195, 201, 203, 403
邦内尔,A.-M.,286
Bonnel, A.-M., 286
Boone, G., 158n.25
Bradlow, AR, 48 n. 19 , 113 , 116
Bradlow, A. R., 48n.19, 113, 116
E. 布拉蒂科,27岁
Brattico, E., 27
布雷格曼, A., 219 , 259 , 319 , 386 , 397
Bregman, A., 219, 259, 319, 386, 397
B. 布林纳,100
Brinner, B., 100
Brown, S., 276, 367
Brownell, H. H., 340–341
Bruce, G., 126, 207
巴克,J.,409
Buck, J., 409
伯纳姆,D.,47 岁
Burnham, D., 47
Burns, E. M., 11, 20, 25
Busnel, R. G., 49n.21
坎贝尔,WN,147
Campbell, W. N., 147
卡普兰, D., 270 , 276 , 292 , 341
Caplan, D., 270, 276, 292, 341
宾夕法尼亚州卡里亚尼,91 岁
Cariani, P. A., 91
JC 卡尔森,196 岁
Carlsen, J. C., 196
Carrington, J. F., 48–49
Casasanto, D., 326n.11
Castellano, M. A., 199n.8, 302
Catchpole, C., 355
Catchpole, C., 355
Chafe, W., 321
Chafe, W., 321
钱多拉,A.,36 岁
Chandola, A., 36
切诺维斯,V.,11岁
Chenoweth, V., 11
乔姆斯基, N., 38 , 109 , 240 , 263
Chomsky, N., 38, 109, 240, 263
克里斯蒂安森,MH,277、359、367
Christiansen, M. H., 277, 359, 367
克拉克,A.,401
Clark, A., 401
克拉克,EF,109、112、116、319 – 320 _ _ _
Clarke, E. F., 109, 112, 116, 319–320
克拉斯,A.,49 岁
Classe, A., 49
Clough, J., 17, 19
克莱恩斯, M., 313
Clynes, M., 313
Cogan, R., 28, 53
科恩,AJ,84、109、193、305、372 n。_ _ _ _ 11
Cohen, A. J., 84, 109, 193, 305, 372n.11
Cohen, D., 344–345
Collier, R., 184, 202, 213
科尔瑟特, M., 72 , 225 , 233 , 268 , 357
Coltheart, M., 72, 225, 233, 268, 357
Comrie, B.,39 岁
Comrie, B., 39
Connell, B., 40, 42–43
Cook, N., 201, 254, 307, 317
库克,北达科他州,92 岁
Cook, N. D., 92
Cooke, D., 304, 310
库珀,GW,139
Cooper, G. W., 139
Cooper, W. E., 108, 140
科波拉,M.,365
Coppola, M., 365
Costa-Giomi,E.,373
Costa-Giomi, E., 373
考特尼,D.,34 岁
Courtney, D., 34
Cross, I., 169, 328, 355, 369
克劳德,RG,24岁
Crowder, R. G., 24
Cuddy, LL, 84 , 195 – 196 , 199 , 202 , 225 , 228 n. 17 , 229 , 246 , 248 , 249 , 252 – 253 , 258 , 261 , 269 , 302 , 357
Cuddy, L. L., 84, 195–196, 199, 202, 225, 228n.17, 229, 246, 248, 249, 252–253, 258, 261, 269, 302, 357
康明斯,F.,152
Cummins, F., 152
卡特勒, A., 62 , 142 , 145 – 146 , 148 , 172 , 186 , 210 , 414
Cutler, A., 62, 142, 145–146, 148, 172, 186, 210, 414
达历山德罗,C.,215
d’Alessandro, C., 215
达尔豪斯, C., 266
Dahlhaus, C., 266
戴诺拉,A.,208
Dainora, A., 208
达拉贝拉, S., 117 , 176 , 185 , 309
Dalla Bella, S., 117, 176, 185, 309
Damasio, A., 313, 319
丹妮尔,JR ,161、164 – 167、222、225 _ _ _
Daniele, J. R., 161, 164–167, 222, 225
Darwin, C., 4, 367–368, 371
Darwin, C. J., 144–146
道尔, RM, 100 , 121 – 126 , 129 , 141 , 152
Dauer, R. M., 100, 121–126, 129, 141, 152
Davidson, L., 202, 373
戴维斯, S., 304 , 308 , 313 , 317
Davies, S., 304, 308, 313, 317
Davis, M. H., 76n.29
德容,KJ,142
de Jong, K. J., 142
de Pijper, JR, 112 , 191 , 213
de Pijper, J. R., 112, 191, 213
Deacon, T. W., 359, 366–367
德卡斯珀,AJ,382
DeCasper, A. J., 382
Delattre, P., 123, 192
Delgutte, B., 13 n. 4 , 91 , 386 , 399
Delgutte, B., 13n.4, 91, 386, 399
Deliege, I., 108 – 109 , 193 , 307
Deliege, I., 108–109, 193, 307
Delius, F., 164
Delius, F., 164
戴尔,F.,158
Dell, F., 158
Dellwo, V., 131 , 133 n. 17 , 166
Dellwo, V., 131, 133n.17, 166
德曼尼, L., 13 , 77 , 228 , 398 , 406
Demany, L., 13, 77, 228, 398, 406
Denora, T., 315, 317, 324
Desain, P., 102, 173
Deutsch, D., 46 – 48 , 280 , 393 – 394
Deutsch, D., 46–48, 280, 393–394
Di Cristo, A., 192, 205
Dibben, N., 317, 319
迪尔, RL, 46 , 56 , 75 – 76 , 170 , 217 , 320
Diehl, R. L., 46, 56, 75–76, 170, 217, 320
Dilley, L., 110 , 153 n. 24 , 210 , 222
Dilley, L., 110, 153n.24, 210, 222
迪萨纳亚克,E.,370
Dissanayake, E., 370
多赫蒂,G.,323
Docherty, G., 323
Doupe、AJ 、360、379、411 _
Doupe, A. J., 360, 379, 411
道林, WJ, 13 , 15 , 17 , 23 – 24 , 26 , 106 , 194 – 195 , 236 , 245 n. 2、373 _ _
Dowling, W. J., 13, 15, 17, 23–24, 26, 106, 194–195, 236, 245n.2, 373
德雷克 C., 100 – 101 , 144 , 376 , 403 , 405 – 406 , 415
Drake, C., 100–101, 144, 376, 403, 405–406, 415
Drayna, D., 201 , 230 , 237 , 358 , 369 , 391 – 392
Drayna, D., 201, 230, 237, 358, 369, 391–392
Dunbar, R. I., 358n.3, 370
Dupoux, E., 138
Dupoux, E., 138
爱德曼,通用汽车,417
Edelman, G. M., 417
Eerola, T., 101, 405
Eimas, P. D., 24, 76
埃克曼,P.,316
Ekman, P., 316
Elbert, T., 174, 401
埃利斯,A.,16 岁
Ellis, A., 16
Elman, J. L., 267, 359
Emmorey, K., 360–361, 364, 376
Everett, D. L., 3, 326n.11
Faber, D., 97, 125
福尔克,D.,370
Falk, D., 370
Fant, G., 38 n. 15 , 155 , 166 , 168 , 192
Fant, G., 38n.15, 155, 166, 168, 192
Fedorenko, E., 279, 287–288
Feld, S., 217, 241
Fernald, A., 195, 386
F. 费雷拉,109
Ferreira, F., 109
Fishman, Y. I., 22, 397
惠誉, WT, 355 , 360 – 361 , 367 – 368 ,395, 409 – 410
Fitch, W. T., 355, 360–361, 367–368,395, 409–410
Fodor, J. A., 241, 267
Fougeron, C., 110, 192, 193
福克斯, P., 323
Foulkes, P., 323
Fowler, C. A., 184n.3, 320
Foxton, JM, 79 n. 31 , 176 , 230 – 238 , 284 , 357 , 391 – 392
Foxton, J. M., 79n.31, 176, 230–238, 284, 357, 391–392
Fraisse, P., 100, 112
弗朗西丝 R., 245 , 285 , 290 , 297
Frances, R., 245, 285, 290, 297
A.弗里伯格,320
Friberg, A., 320
弗里德里希,公元73岁,174 岁,285 岁,288 岁,297 岁
Friederici, A. D., 73, 174, 285, 288, 297
美国弗里斯,357
Frith, U., 357
V. 弗洛金,40 岁
Fromkin, V., 40
Fry, D., 24 , 201 , 230 , 357 – 358 , 369 , 392
Fry, D., 24, 201, 230, 357–358, 369, 392
藤冈 T.,27岁
Fujioka, T., 27
Fussell, P., 154–155
N. 加布,79 岁
Gaab, N., 79
加布里埃尔森, A., 116 – 117 , 138 , 309 , 311 , 312 , 317
Gabrielsson, A., 116–117, 138, 309, 311, 312, 317
甘杜尔,J.,75岁
Gandour, J., 75
Gardner, H., 202, 373
加菲亚斯,R.,159
Garfias, R., 159
加塞尔,C.,270
Gaser, C., 270
Geissmann, T., 367n.7, 411n.23
根特纳、TQ 、10、244、265、395 _ _
Gentner, T. Q., 10, 244, 265, 395
Gerhardt, HC, 100 , 408 , 411 n. 23
Gerhardt, H. C., 100, 408, 411n.23
Gerken, L. A., 309, 312
吉布森, E., 110 , 253 , 271 – 275 , 277 – 279 , 284 – 285 , 292 , 337 – 340
Gibson, E., 110, 253, 271–275, 277–279, 284–285, 292, 337–340
吉布森,JJ,319
Gibson, J. J., 319
Giguère, J.-F., 185
Giguère, J.-F., 185
Gjerdingen, R. O., 261, 321
Goldin-Meadow, S., 300 n. 2 , 326 名词。11 , 365
Goldin-Meadow, S., 300n.2, 326n.11, 365
Goldstein, A., 317 – 318 , 319 n. 10
Goldstein, A., 317–318, 319n.10
戈普尼克,M.,388
Gopnik, M., 388
Gordon, P. C., 278n.12
后藤 H.,68 岁
Goto, H., 68
Grabe, E., 121 , 124 , 131 , 133 – 135 , 140 , 150 , 161 , 166 , 206 , 208
Grabe, E., 121, 124, 131, 133–135, 140, 150, 161, 166, 206, 208
Grahn, J., 410
Grahn, J., 410
Greenberg, S., 39n.16, 113, 123
Greenfield, M. D., 408–409
Greenspan, R., 390, 391
Gregory, A. H., 106, 310
J. 格雷,30 岁
Grey, J., 30
格里菲思,TD,226,228 n。17 , 231 – 233 , 236 , 392
Griffiths, T. D., 226, 228n.17, 231–233, 236, 392
格罗德纳,DJ,278
Grodner, D. J., 278
格罗斯让,前锋,109
Grosjean, F., 109
格罗斯,H.,154
Gross, H., 154
Grout, D. J., 165, 224
Guenther, F. H., 80, 82
冈特,TC,288
Gunter, T. C., 288
Gussenhoven, C., 111, 198
美国古特,135
Gut, U., 135
哈曼,HJ ,283,285,297 _
Haarmann, H. J., 283, 285, 297
Hacohen, R., 328–332
海斯勒,S.,389
Haesler, S., 389
Hagoort, P., 271 – 272 , 285 , 291 , 332
Hagoort, P., 271–272, 285, 291, 332
哈吉达,JA,29 岁
Hajda, J. A., 29
J. 黑尔,279
Hale, J., 279
Hall, M. D., 76, 81
Hall, R. A., Jr., 161, 222
Halle, J., 155, 158
Halle, M., 38, 109, 139, 168
哈利迪,197 岁
Halliday, 197
阿肯色州哈尔彭,25 岁
Halpern, A. R., 25
Handel, S., 24, 72, 96
汉农,EE ,98,102,157,194,377,383,385,406,412 – 415 _ _ _ _ _ _ _ _ _ _ _ _
Hannon, E. E., 98, 102, 157, 194, 377, 383, 385, 406, 412–415
Hanslick, E., 300 n. 1、306 _ _
Hanslick, E., 300n.1, 306
Hargreaves, D. J., 323, 324
哈斯佩尔马斯,M.,173
Haspelmath, M., 173
海顿河,321
Hatten, R., 321
豪瑟, M., 94 , 184 n. 3 , 244 , 265 , 355 , 361 , 377 , 381 , 395 – 397 , 399 , 402 , 408 , 410
Hauser, M., 94, 184n.3, 244, 265, 355, 361, 377, 381, 395–397, 399, 402, 408, 410
Hawkins, S., 81, 134
海伊,J.,170
Hay, J., 170
海斯 B., 110 , 111 , 161 n. 26 , 170
Hayes, B., 110, 111, 161n.26, 170
希顿, P., 371
Heaton, P., 371
赫伯特, S., 357
Hebert, S., 357
Heeschen, C., 167
Heeschen, C., 167
Helmholtz, H. von., 16, 88
Hepper, P., 382–383
赫尔曼,LM,355
Herman, L. M., 355
赫默伦,301
Hermeren, 301
Hermes, D., 140 , 190 , 213 – 215 , 222 , 223 n. 15、403 _ _
Hermes, D., 140, 190, 213–215, 222, 223n.15, 403
赫尔佐格,G.,12 n。2 , 48 , 199名词。8、217 _ _
Herzog, G., 12n.2, 48, 199n.8, 217
Hevner, K., 309, 311, 314
G. 希科克,75岁
Hickok, G., 75
Hillyard, S., 271, 332
Hinton, L., 12n.2, 62
Hirschberg, J., 207–208
赫什-帕塞克,K.,108
Hirsh-Pasek, K., 108
赫斯特,D.,205
Hirst, D., 205
Hobbs, J. R., 336, 338
加利福尼亚州霍基特,10岁
Hockett, C. A., 10
霍尔科姆, P., 253 , 271 – 275 , 341
Holcomb, P., 253, 271–275, 341
霍兰德,J.,154
Hollander, J., 154
霍尔姆,J.,365
Holm, J., 365
Holst, I., 301n.3
霍尔特,LL,22、24、76、320 _ _ _
Holt, L. L., 22, 24, 76, 320
Honing, H., 102, 320
霍顿,T.,241
Horton, T., 241
House, D., 214–215
Huber, F., 100 , 408 , 411 n. 23
Huber, F., 100, 408, 411n.23
休斯,D.,62 岁
Hughes, D., 62
赫尔斯, SH, 10 , 13 , 393 , 396 – 399
Hulse, S. H., 10, 13, 393, 396–399
休伦 D., 84 , 165 – 166 , 178 , 194 , 197 , 201 , 203 , 219 , 249 , 259 , 282 , 305 , 369
Huron, D., 84, 165–166, 178, 194, 197, 201, 203, 219, 249, 259, 282, 305, 369
侯赛因,F.,236
Husain, F., 236
哈钦斯, S., 261
Hutchins, S., 261
Huttenlocher, P., 363, 401
海德 K., 176 , 228 , 230 – 231 , 237 , 284 , 391 – 394
Hyde, K., 176, 228, 230–231, 237, 284, 391–394
Hyman, L. M., 39n.15, 43
Idsardi, W., 71, 139
Ilie, G., 349–350
艾弗森, JR, 36 , 63 , 65 , 79 , 101 , 105 , 107 , 162 , 165 , 170 – 172 , 176 , 222 , 291 , 295 , 403 – 404 , 407 – 408 , 410
Iversen, J. R., 36, 63, 65, 79, 101, 105, 107, 162, 165, 170–172, 176, 222, 291, 295, 403–404, 407–408, 410
Iverson, P., 11, 68, 69, 80
Ives, C., 9n.1
Izumi, A., 396–397
Jackendoff, R., 93 , 103 , 105 – 106 , 109 , 139 , 190 – 191 , 201 – 202 , 240 – 241 , 242 n. 1 , 244 , 254 – 259 , 261 , 263 , 265 – 266 , 307 – 308 , 358
Jackendoff, R., 93, 103, 105–106, 109, 139, 190–191, 201–202, 240–241, 242n.1, 244, 254–259, 261, 263, 265–266, 307–308, 358
Jaeger, F., 279–280
Jairazbhoy, N. A., 16, 22, 88
雅各布森, R., 38 , 168 – 169 , 306
Jakobson, R., 38, 168–169, 306
詹姆斯,W.,367
James, W., 367
Janata, P., 201, 253, 410
Jarvis, E. D., 389, 411
K. 约翰逊,52 岁
Johnson, K., 52
Johnsrude, IS, 73 , 76 n. 29 , 236
Johnsrude, I. S., 73, 76n.29, 236
琼斯先生,100 – 102、105、145、151、195、202 – 203、403、405 _ _ _ _ _ _ _ _ _ _ _ _
Jones, M. R., 100–102, 105, 145, 151, 195, 202–203, 403, 405
Jongsma, MLA, 105
Jongsma, M. L. A., 105
Jun, S.-A., 110 – 111 , 119 , 182 , 192 , 193 , 194, 207 , 362
Jun, S.-A., 110–111, 119, 182, 192, 193,194, 207, 362
Jungers, M., 116–117
Jurafsky, D., 279
Jurafsky, D., 279
尤斯奇克,PN ,24、76、108、116、128、137、192、218、382 _ _ _ _ _ _ _ _ _ _ _ _ _
Jusczyk, P. N., 24, 76, 108, 116, 128, 137, 192, 218, 382
贾斯林, P., 116 , 206 , 309 , 315 , 317 , 345 – 348
Juslin, P., 116, 206, 309, 315, 317, 345–348
Just, M. A., 288, 341
贾斯图斯, TC, 245 , 294 , 377 , 402
Justus, T. C., 245, 294, 377, 402
Kaan, E., 272 , 283 – 285 , 297
Kaan, E., 272, 283–285, 297
Kalmus, H., 201 , 230 , 357 , 369 , 392
Kalmus, H., 201, 230, 357, 369, 392
Karmiloff-Smith, A., 357 , 388
Karmiloff-Smith, A., 357, 388
N. 卡赞尼娜,71岁
Kazanina, N., 71
Keane, E., 133n.17
基廷,P.,110
Keating, P., 110
Kehler, A., 336–338
凯勒,A.,240
Keiler, A., 240
Kelly, M. H., 141–142, 156
Kendall, R., 29, 116
凯斯勒,EJ,199 – 200,252,302,399 _ _ _
Kessler, E. J., 199–200, 252, 302, 399
国王,J.,288
King, J., 288
Kippen, J., 34, 62
柯克帕特里克,R.,159
Kirkpatrick, R., 159
基西列夫斯基,理学士,383
Kisilevsky, B. S., 383
Kivy, P., 304 , 306 – 308 , 313 , 323 , 327 , 335 , 344
Kivy, P., 304, 306–308, 313, 323, 327, 335, 344
Klatt, D., 113, 360
Klima, E., 361n.6, 364
Kluender,韩国,76 岁
Kluender, K. R., 76
克诺舍, TR, 174
Knösche, T. R., 174
Koelsch, S., 260 , 275 – 276 , 284 , 287 – 288 , 331 , 333 – 335 , 373
Koelsch, S., 260, 275–276, 284, 287–288, 331, 333–335, 373
科尔克,HH,283,285,297 _
Kolk, H. H., 283, 285, 297
科内奇尼,VJ,307
Konečni, V. J., 307
科茨,S.,389
Kotz, S., 389
克拉利奇,TC,85
Kraljic, T. C., 85
克莱默,L.,304
Kramer, L., 304
Krumhansl, CL , 18 , 21 , 26 , 33 – 34 , 84 – 85 , 101 , 105 , 108 – 109 , 192 , 196 , 199 , 200 , 225 , 242 , 245 , 247 , 250 , 502 , 6 , 2 262 , 264 , 266 , 281 ,302 , 309 , 316 , 321 – 322 , 347 , 399 , 405
Krumhansl, C. L., 18, 21, 26, 33–34, 84–85, 101, 105, 108–109, 192, 196, 199, 200, 225, 242, 245, 247, 250, 252, 258, 260, 262, 264, 266, 281, 302, 309, 316, 321–322, 347, 399, 405
Kuhl, PK, 24 , 56 – 57 , 59 , 69 – 70 , 76 , 80 , 84 – 85 , 195 , 209 , 247 , 360 – 362 , 385 n。13 , 386 , 395
Kuhl, P. K., 24, 56–57, 59, 69–70, 76, 80, 84–85, 195, 209, 247, 360–362, 385n.13, 386, 395
库珀伯格,GR,341
Kuperberg, G. R., 341
楠本。K., 170
Kusumoto. K., 170
Kutas, M., 271, 332
拉博夫,W.,323
Labov, W., 323
Ladd, DR, 25 , 44 – 46 , 194 , 203 , 206 – 209 , 213 n. 11 , 225 , 334 , 350
Ladd, D. R., 25, 44–46, 194, 203, 206–209, 213n.11, 225, 334, 350
Ladefoged, P., 54 , 55 n. 26 , 56 , 59 , 65 – 66 , 121 , 213
Ladefoged, P., 54, 55n.26, 56, 59, 65–66, 121, 213
赖, CSL, 389
Lai, C. S. L., 389
兰加克,RW,303
Langacker, R. W., 303
兰格,S.,317,320,367 _
Langer, S., 317, 320, 367
大型、EW 、101 – 102、116、145、151、255、256 _ _ _ _ _ _ _
Large, E. W., 101–102, 116, 145, 151, 255, 256
勒杜,J.,319
LeDoux, J., 319
Lee, C. S., 133n.17, 150, 161
Lehiste, I., 100 , 119 , 129 , 143 – 144 , 146 , 156
Lehiste, I., 100, 119, 129, 143–144, 146, 156
Leman, M., 259–260
伦嫩伯格,E.,363
Lennenberg, E., 363
Lerdahl, F., 93 , 103 , 105 – 106 , 109 , 139 , 155 , 190 – 191 , 201 – 202 , 206 n. 10 , 240 – 242 , 254 – 258 , 260 – 261 , 263 , 265 , 280 – 281 , 307
Lerdahl, F., 93, 103, 105–106, 109, 139, 155, 190–191, 201–202, 206n.10, 240–242, 254–258, 260–261, 263, 265, 280–281, 307
Levelt, W. J. M., 89, 121, 264
Levinson, G., 307, 337
列维-斯特劳斯,C.,300
Lévi-Strauss, C., 300
Levitin, DJ, 23 , 276 , 318 , 350 , 375 , 388 n. 14 , 393 – 394
Levitin, D. J., 23, 276, 318, 350, 375, 388n.14, 393–394
利维,R.,279
Levy, R., 279
刘易斯,RL,280
Lewis, R. L., 280
Liberman, A. M., 72, 76
Liberman, M., 97 , 126 , 139 – 140 , 183 n. 1个
Liberman, M., 97, 126, 139–140, 183n.1
利伯曼,特拉华州,367
Lieberman, D. E., 367
Lieberman, P., 183, 360, 390
Liégeois, P., 389
Liégeois, P., 389
Liégeois-Chauvel, C., 75 , 175 , 236
Liégeois-Chauvel, C., 75, 175, 236
Lindblom, B., 56 , 58 , 81 n. 32 , 209 , 240
Lindblom, B., 56, 58, 81n.32, 209, 240
Lochy, A., 228 n. 17 , 233 – 235
Lochy, A., 228n.17, 233–235
Locke, D., 48 , 62 , 77 n. 30 , 98
Locke, D., 48, 62, 77n.30, 98
洛克,JL,359
Locke, J. L., 359
Löfqvist, A., 213
Löfqvist, A., 213
Longhi, E., 380, 405
乐透,AJ,80
Lotto, A. J., 80
低, EL, 121 , 124 , 128 , 131 , 133 , 135 , 150 , 161
Low, E. L., 121, 124, 128, 131, 133, 135, 150, 161
Luce, P. A., 85, 116
Luria, A. R., 268, 270
Lynch, M. P., 83, 383
Maddieson, I., 39 , 41 – 42 , 51 , 54 – 56 , 65
Maddieson, I., 39, 41–42, 51, 54–56, 65
梅斯,B.,275
Maess, B., 275
Magne, C., 349n.16
Marcus, G. F., 366, 389
马林、OSM、4、226、268、285 _ _ _
Marin, O. S. M., 4, 226, 268, 285
马勒, P., 10 , 244 , 259 , 356 , 360 , 363 , 379
Marler, P., 10, 244, 259, 356, 360, 363, 379
马斯伦-威尔逊,W.,279
Marslen-Wilson, W., 279
马丁,JG,145
Martin, J. G., 145
Marvin, E. W., 47, 307, 393
梅森,RA,341
Mason, R. A., 341
Mattys, S., 149–150
梅伯里,罗德岛,363
Mayberry, R. I., 363
Maye, J., 84–85, 385
麦克亚当斯, S., 17 , 30 , 33 – 34 , 197 , 302
McAdams, S., 17, 30, 33–34, 197, 302
麦克德莫特, J., 94 , 355 , 377 , 381 , 395 – 397 , 399 , 402 , 408
McDermott, J., 94, 355, 377, 381, 395–397, 399, 402, 408
McKinney, M. F., 13n.4, 399
麦克卢卡斯,A.,214
McLucas, A., 214
McMullen, E., 72, 85, 384
麦克尼尔, WH, 402
McNeill, W. H., 402
麦昆,JM,85 岁
McQueen, J. M., 85
Meck, W. H., 404, 410
梅勒, J., 126 – 130 , 136 – 137 , 148 , 150 , 161 , 165 , 225 , 382
Mehler, J., 126–130, 136–137, 148, 150, 161, 165, 225, 382
Menon, V., 276, 318, 350
Merker, B., 10 , 361 , 367 , 370 , 403 , 410
Merker, B., 10, 361, 367, 370, 403, 410
Mertens, P., 188 , 189 , 214 – 215
Mertens, P., 188, 189, 214–215
Merzenich, M., 78 , 400 n. 19 , 401
Merzenich, M., 78, 400n.19, 401
迈耶,LB,17,139,201,205,254,303 – 306,308,318,323,338 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Meyer, L. B., 17, 139, 201, 205, 254, 303–306, 308, 318, 323, 338
Miller, G., 368–369
Miller, G. A., 13, 19, 356
米勒,LK,371
Miller, L. K., 371
米兰达, R., 335
Miranda, R., 335
Mithen, S., 368 , 371 , 411 n. 24
Mithen, S., 368, 371, 411n.24
Moelants, D., 100
Moelants, D., 100
Monelle, R., 304, 321
月亮,C.,382
Moon, C., 382
Moore, B. C. J., 87, 186n.4
莫顿,E.,170
Moreton, E., 170
Morgan, J. L., 172, 382
Morgenstern, S., 163–164
Morley, I., 370
Morley, I., 370
Morrongiello, BA, 406
Morrongiello, B. A., 406
Munhall, K., 53 n. 24 , 228名词 17
Munhall, K., 53n.24, 228n.17
Munte, T., 175, 401
Näätänen, R., 26–28, 71
Nakata, T., 370, 381–382
Narmour, E., 196 , 205 , 219 , 305 , 338
Narmour, E., 196, 205, 219, 305, 338
Nattiez, J. J., 304–306, 350
纳兹, T., 137 , 161 , 165 , 225 , 382
Nazzi, T., 137, 161, 165, 225, 382
Nearey, T.,56 岁
Nearey, T., 56
内斯波尔,M. ,109、124、127、142 _ _ _
Nespor, M., 109, 124, 127, 142
Nettl, B., 3 , 11 , 18 – 19 , 402
Nettl, B., 3, 11, 18–19, 402
纽鲍尔,J.,4岁
Neubauer, J., 4
Newport, E. L., 363, 374, 395
Nketia,KJH,18 岁
Nketia, K. J. H., 18
Nolan, F., 131, 135, 161, 223
L.诺德,155
Nord, L., 155
North, A. C., 323–324
诺顿,A.,387
Norton, A., 387
奥克斯,E.,195
Ochs, E., 195
Ohala, J. J., 186n.5, 360
Ohgushi, K., 168, 170–173
奥托尼,A.,315
Ortony, A., 315
Ostendorf, M., 110–111
Osterhout, L., 271–272
Overy, K.,79 岁
Overy, K., 79
帕维莱宁,P.,28 岁
Paavilainen, P., 28
帕默 C., 101 , 105 , 114 , 116 – 117 , 156 , 405
Palmer, C., 101, 105, 114, 116–117, 156, 405
潘克塞普, J., 319
Panksepp, J., 319
潘尼坎普,A.,174
Pannekamp, A., 174
Pantaleoni, H., 11, 98
Pantev, C., 27, 401
Papousek, M., 381, 386, 405
Parncutt, R., 92 , 100 , 258 – 259
Parncutt, R., 92, 100, 258–259
Parsons, L. M., 75, 276
Partee, 303, 335
Pascual-Leone, 401
Pascual-Leone, 401
Pastore, R. E., 76, 81
Patel, AD, 36 , 51 n. 22 , 63 , 65 , 75 , 79 , 87 , 101 , 105 , 134名词。18 , 153 , 158 , 161 – 165 , 167 , 170 , 176 , 183名词。2 , 195 , 205 , 218 , 222 – 228 , 229 , 231 – 233 , 236 , 253 , 268 , 271 – 276 , 284 , 291 – 296 , 392 , 403 – 404 , 407 – 408 , 410 – 411
Patel, A. D., 36, 51n.22, 63, 65, 75, 79, 87, 101, 105, 134n.18, 153, 158, 161–165, 167, 170, 176, 183n.2, 195, 205, 218, 222–228, 229, 231–233, 236, 253, 268, 271–276, 284, 291–296, 392, 403–404, 407–408, 410–411
Payne, K., 10, 355
Pearl, J., 183n.2
Penel, A., 101, 376
佩珀伯格,IM,355
Pepperberg, I. M., 355
佩雷茨 I., 4 , 72 , 75 , 106 , 109 , 175 – 176 , 185 , 201 , 225 – 231 , 233 , 237 , 268 – 270 , 284 , 286 , 291 , 297 , 309 , 367 , 357 , 357 , 372 , 380 , 391 – 392 ,394、401 _ _
Peretz, I., 4, 72, 75, 106, 109, 175–176, 185, 201, 225–231, 233, 237, 268–270, 284, 286, 291, 297, 309, 347, 357, 369, 372, 380, 391–392, 394, 401
Perlman, M., 18, 26, 85, 100
Perry, D. W., 4, 226, 268, 285
佩塞茨基, D., 241
Pesetsky, D., 241
佩蒂托,洛杉矶,360
Petitto, L. A., 360
Pfordresher,PQ ,203,405
Pfordresher, P. Q., 203, 405
Phillips, C., 71, 272, 279
Pierrehumbert, J., 183 n. 1 , 184 , 191 , 207 – 210
Pierrehumbert, J., 183n.1, 184, 191, 207–210
Pike, K. N., 97, 119–121, 207
Pinker, S., 358, 367, 400
Pisoni, D. B., 22, 80, 116
活塞,W.,249
Piston, W., 249
Pitt, M. A., 145–146, 403
普拉克,CJ,87岁
Plack, C. J., 87
Plantinga, J., 396
Plantinga, J., 396
普罗姆普,R.,89 岁
Plomp, R., 89
Poeppel, D., 72, 74–75
Polka, L., 84, 361
港口,射频,152
Port, R. F., 152
Poulin-Charronnat, B., 26 , 260 , 286 – 287
Poulin-Charronnat, B., 26, 260, 286–287
波维尔、DJ 、102、202、406 – 407、413 _ _ _
Povel, D. J., 102, 202, 406–407, 413
权力,HS,241
Powers, H. S., 241
Pressing, J., 98, 175
Price, C. J., 284, 388–389
价格,P.,111
Price, P., 111
普鲁姆,G.,326
Pullum, G., 326
Purves, D., 92, 94n.35
派尔斯,J.,369
Pyers, J., 369
赛车手,297
Racette, 297
拉夫曼,D.,317
Raffman, D., 317
Rakowski, A., 22, 44
拉姆斯, F., 79 , 126 – 130 , 133 – 134 , 136 – 137 , 149 – 150 , 153 , 161 , 165 – 166 , 225
Ramus, F., 79, 126–130, 133–134, 136–137, 149–150, 153, 161, 165–166, 225
兰德尔,DM,12
Randel, D. M., 12
拉特纳,L.,321
Ratner, L., 321
Rauschecker, JP, 236
Rauschecker, J. P., 236
雷克,D.,11岁
Reck, D., 11
Regnault, P., 260
Regnault, P., 260
Remez, R. E., 76–77
雷普, BH, 24 , 101 – 102 , 105 , 108 , 114 – 115
Repp, B. H., 24, 101–102, 105, 108, 114–115
里亚兰,A.,49 岁
Rialland, A., 49
Riemann, H., 266–267
Risset, J.-C., 30, 86
罗奇,页,121
Roach, P., 121
罗尔迈尔,M.,241
Rohrmeier, M., 241
Rosch, E., 21, 245
Rosen, S., 25, 60, 79
Rosenberg, J. C., 162, 165, 222
罗斯,D.,394
Ross, D., 394
罗斯,E.,348
Ross, E., 348
Ross, J., 112, 156
罗西,M.,213
Rossi, M., 213
罗斯坦,E.,303
Rothstein, E., 303
Russolo, L., 28n.12
Rymer,R.,363
Rymer, R., 363
Sacks, O., 103, 229, 350
Sadakata, M., 173
Sadakata, M., 173
Saffran, JR, 47 , 72 , 80 , 84 – 85 , 224 – 225 , 384 – 385 , 395 , 415
Saffran, J. R., 47, 72, 80, 84–85, 224–225, 384–385, 395, 415
Samuel, A. G., 85, 145–146, 403
Savage-Rumbaugh, S., 355 , 409
Savage-Rumbaugh, S., 355, 409
谢弗,R.,109
Schaefer, R., 109
谢弗,AJ,186 岁
Schafer, A. J., 186
沙夫拉特,H.,99岁
Schaffrath, H., 99
舍格洛夫,EA,113
Schegloff, E. A., 113
Scheirer, E., 384
Scheirer, E., 384
Schellenberg, EG, 23 , 93 – 94 , 195 – 197 , 344 – 345 , 348 – 349 , 370 , 381 , 384 , 394 , 398
Schellenberg, E. G., 23, 93–94, 195–197, 344–345, 348–349, 370, 381, 384, 394, 398
Schenker, H., 201, 254, 307
Scherer, K., 45 , 312 , 315 , 317 , 344 – 345 , 349
Scherer, K., 45, 312, 315, 317, 344–345, 349
Schieffelin, B., 195, 364
Schlaug, G., 270, 387
Schmuckler, M. A., 76, 197, 282
舒尔金德,M.,108
Schulkind, M., 108
Schwartz, D. A., 92, 94n.35
Scott, D. R., 144, 147
斯科特,SK,73 岁
Scott, S. K., 73
海滨, C., 207
Seashore, C., 207
Sebeok,助教,49 岁
Sebeok, T. A., 49
西格,A.,324
Seeger, A., 324
Selfridge-Feld, E., 99
Selfridge-Feld, E., 99
塞尔柯克,EO,96 – 97 , 109 , 139 – 140 , 403
Selkirk, E. O., 96–97, 109, 139–140, 403
塞门德费里,K.,367
Semendeferi, K., 367
森加斯,A.,365
Senghas, A., 365
Sethares, W. A., 15n.6
沙欣,S.,184
Shaheen, S., 184
Shamma, S., 87, 236
Shattuck-Hufnagel, S., 110 – 111 , 119 ,126, 139 – 140
Shattuck-Hufnagel, S., 110–111, 119,126, 139–140
谢巴林,VG,270
Shebalin, V. G., 270
谢泼德 R., 13 , 14 , 22 , 199 .n8, 302
Shepard, R., 13, 14, 22, 199.n8, 302
斯莱尼,M.,384
Slaney, M., 384
Slevc, LR, 78 – 79 , 287 , 289 – 290 , 387
Slevc, L. R., 78–79, 287, 289–290, 387
斯洛博达, JA, 105 – 106 , 202 , 229 , 261 , 308 , 310 , 315 , 317 – 318 , 323 – 324 , 370
Sloboda, J. A., 105–106, 202, 229, 261, 308, 310, 315, 317–318, 323–324, 370
Sluijter, A. J. M., 119n.9
Smith, J. D., 25–26, 249
Snyder, J. S., 101–102
清醒,E.,359
Sober, E., 359
Speer, S. R., 116–117, 186
Spencer, H., 317, 345
斯蒂尔, J., 182 , 186 – 187 , 211
Steele, J., 182, 186–187, 211
Steinbeis, N., 281, 335
Steinhauer, K., 174
Steinhauer, K., 174
Steinke, WR, 199
Steinke, W. R., 199
斯坦施奈德,M.,22岁
Steinschneider, M., 22
史蒂文斯, KN, 11 , 52 n. 23 , 56 , 57 , 59 , 362
Stevens, K. N., 11, 52n.23, 56, 57, 59, 362
Stewart, L., 72, 226, 357
Stone, R. M., 12n.2
Strogatz, S., 100
Strogatz, S., 100
Sundberg, J., 55 , 62 , 115 , 207 , 240 , 320 , 345
Sundberg, J., 55, 62, 115, 207, 240, 320, 345
Swaab, T., 283, 285, 297
Swain, J., 241, 256
't Hart, J., 40 , 182 , 198 , 202 , 211 – 212 , 214 , 215 n. 12
’t Hart, J., 40, 182, 198, 202, 211–212, 214, 215n.12
Takeuchi, A., 393, 399n.18
塔拉尔,P.,79 岁
Tallal, P., 79
Tan, S-L., 307, 323–324
泰勒,DS,97 岁
Taylor, D. S., 97
Tekman, H. G., 260, 294–295
Temperley, D., 98 , 102 , 156 – 157
Temperley, D., 98, 102, 156–157
Terhardt, E., 92, 94n.35
Terken, J., 140 , 184 – 185 , 206 , 403
Terken, J., 140, 184–185, 206, 403
Tervaniemi, M., 27 , 73 – 74 , 237
Tervaniemi, M., 27, 73–74, 237
索特,MH,103
Thaut, M. H., 103
E. 蒂埃里,49 岁
Thierry, E., 49
托马森,J.,194
Thomassen, J., 194
汤普森,WF,253、309、312-314、343-345、348-350 _ _ _ _ _ _ _ _ _ _ _
Thompson, W. F., 253, 309, 312–314, 343–345, 348–350
蒂尔曼 B., 260 , 262 , 276 , 282 , 284 , 287 , 293 – 294 , 296 – 297 , 307
Tillmann, B., 260, 262, 276, 282, 284, 287, 293–294, 296–297, 307
蒂顿,JT,11
Titon, J. T., 11
托德,NP,102、109、112、114、133 n。_ _ _ _ 17 , 150 , 161 , 320
Todd, N. P., 102, 109, 112, 114, 133n.17, 150, 161, 320
长袍,AW,237
Toga, A. W., 237
Toiviainen, P., 101 – 102 , 196 , 262 , 302 , 399 , 405
Toiviainen, P., 101–102, 196, 262, 302, 399, 405
东城, S., 241
Tojo, S., 241
Tomasello, M., 355, 359
教练,LJ ,27、47、78-79、83、85、94、173、195、246-247、249、253、344-345、347、370、372-375、380-383、396-、380-383、396-、_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 398 , 405 – 406 , 415
Trainor, L. J., 27, 47, 78–79, 83, 85, 94, 173, 195, 246–247, 249, 253, 344–345, 347, 370, 372–375, 380–383, 387, 396–398, 405–406, 415
Tramo, M., 91 , 226 – 228 , 269 , 296 , 386
Tramo, M., 91, 226–228, 269, 296, 386
Trehub, S. , 21 , 23 , 83 , 93 – 94 , 98 , 195 , 246 – 247 , 249 , 253 , 344 – 345 , 370 , 372 – 373 , 377 , 379 – 386 , 394 , 396 , , 4068 – 407 , 412 – 413
Trehub, S., 21, 23, 83, 93–94, 98, 195, 246–247, 249, 253, 344–345, 370, 372–373, 377, 379–386, 394, 396, 398, 406–407, 412–413
Trevarthen, C., 380
Trevarthen, C., 380
特罗尼克,EZ,381
Tronick, E. Z., 381
冢田 K.,37 岁
Tsukada, K., 37
土耳其人,A. ,110,119,139 _
Turk, A., 110, 119, 139
Tyack, PL, 10
Tyack, P. L., 10
Tzortzis, C., 270
Tzortzis, C., 270
Ullman, M. T., 276, 335
Vaissiere, J.,46 岁
Vaissiere, J., 46
van de Weijer, 147, 364, 384
范古力克,R.,97 岁
van Gulik, R., 97
范诺登,100岁
van Noorden, 100
范瓦林, RD, 242
Van Valin, R. D., 242
Vargha-Khadem, F., 366 , 388 – 389 , 405
Vargha-Khadem, F., 366, 388–389, 405
Vasishth, S., 280
Vasishth, S., 280
瓦西拉基斯,P.,90 岁
Vassilakis, P., 90
Vogel, I., 109, 142
Voisin, F.,19 岁
Voisin, F., 19
冯·希佩尔,197 岁
von Hippel, 197
Vos, P., 94, 219
Wagner, N., 328–332
荷兰沃林,367
Wallin, N. L., 367
Ward, W. D., 20, 25, 393
瓦森纳,M.,291
Wassenaar, M., 291
渡边, S., 398
Watanabe, S., 398
沃特斯,G.,276
Waters, G., 276
瓦特,RJ,329
Watt, R. J., 329
Wedekind, K., 41, 43, 50
Weisman, R. G., 394, 396
Welmers, W. E., 43–44, 46
Wenk, B. J., 160–161
Werker, J. F., 69–70, 83, 361–362
Wessel, D., 30, 33
Whalen, DH, 46 , 51 , 213 , 316 n. 4个
Whalen, D. H., 46, 51, 213, 316n.4
Whaling, C. S., 356, 379
White, L., 149–150
怀特曼,CW,111
Wightman, C. W., 111
Will, U., 15n.5, 17
威廉姆斯, N., 225
Willems, N., 225
威尔逊,EO,417
Wilson, E. O., 417
威尔逊,SJ,175 岁
Wilson, S. J., 175
赢家,E.,387
Winner, E., 387
Wolf, F., 337–340
Wong, PCM, 46 , 75 , 79 , 217 , 349 n. 16
Wong, P. C. M., 46, 75, 79, 217, 349n.16
伍德罗,哈,169
Woodrow, H. A., 169
Wright, A. A., 13, 398–399
Xu, Y., 46 , 51 , 187 n. 6、235 _ _
Xu, Y., 46, 51, 187n.6, 235
山本 F.,157岁
Yamomoto, F., 157
容,B.,217
Yung, B., 217
扎托雷, RJ, 25 , 72 , 74 , 228 , 237 , 318 , 350 , 375 , 394 – 395
Zatorre, R. J., 25, 72, 74, 228, 237, 318, 350, 375, 394–395
Zbikowski, L. M., 304, 342
Zemp, H., 16, 20, 41, 88
齐普夫,门将,219
Zipf, G. K., 219
绝对音高
absolute pitch
genetics, 393–395
在婴儿中,47
in infants, 47
in music, 375, 393
在非人类动物中,396
in nonhuman animals, 396
in speech, 46–48
重音(突出)
accent (prominence)
in language, 118–119. See also stress
in music, 104–105
口音, 外国语. 见外国口音
accent, foreign. See foreign accent
音乐的适应主义假设。看进化
adaptationist hypotheses for music. See evolution
非洲人
African
打鼓,98
drumming, 98
会说话的鼓。看会说话的鼓
talking drums. See talking drums
tone languages, 40–41, 43–46
木琴, 18
xylophones, 18
娱乐
amusia
收购, 226
acquired, 226
先天的。见音聋
congenital. See tone-deafness
dissociation from aphasia, 268–270
和语音语调感知, 226 – 228 , 230 – 238
and speech intonation perception, 226–228, 230–238
动物
animal
语言能力,355
language abilities, 355
musical abilities, 395–400
song vs. human music, 355–356
失语症
aphasia
与听觉失认症,270
vs. auditory agnosia, 270
和对音乐句法的感知,284 – 285 , 290 – 297
and perception of musical syntax, 284–285, 290–297
348 _
aprosodia, 348
听觉失认症(对比失语症),270
auditory agnosia (vs. aphasia), 270
Australian aboriginal music, 15n.5, 17, 32
自闭症
autism
和绝对音高,393
and absolute pitch, 393
和音乐欣赏,371
and musical appreciation, 371
自动节律 (AM) 理论。看语调
autosegmental-metrical (AM) theory. See intonation
babbling, 359–360
巴赫,约翰·塞巴斯蒂安,257、289、301、342 – 344 _
Bach, Johan Sebastian, 257, 289, 301, 342–344
巴尔干节奏,98、157、383、412 – 413 _ _ _ _
Balkan rhythms, 98, 157, 383, 412–413
基底神经节
basal ganglia
and language, 389–390
and music, 404–405, 410–411
打。看节奏
beat. See rhythm
Beatles, The, 157, 342
贝多芬,路德维希范,108,300 – 302,321,395 _ _ _
Beethoven, Ludwig van, 108, 300–302, 321, 395
本农语言和音乐,50
Benčnon language and music, 50
贝奥武夫, 155 岁
Beowulf, 155
伯恩斯坦,伦纳德,4岁,92 岁,240岁,259岁,263岁
Bernstein, Leonard, 4, 92, 240, 259, 263
鸟类
birds
绝对与相对音高感知,396
absolute vs. relative pitch perception, 396
先天的感知倾向,378
innate perceptual predispositions, 378
perception of music by, 398, 411
歌曲功能,10
song function, 10
song learning, 356, 378–379
song syntax, 242–243, 265
song vs. human music, 355–356
Blackwood, Easley, 9n.1
威廉·布雷克,156 岁
Blake, William, 156
倭黑猩猩。看到黑猩猩和倭黑猩猩
bonobos. See chimpanzees and bonobos
brain
伤害和音乐。见amusia
damage and music. See amusia
扩散张量成像。见贸工部
diffusion tensor imaging. See DTI
不同的语言和音乐区域,73
distinct regions for language and music, 73
事件相关电位。见企业资源计划
event-related potentials. See ERP
功能磁共振成像。见功能磁共振成像
functional magnetic resonance imaging. See fMRI
脑磁图。参见MEG
magnetoencephalography. See MEG
错配消极性。见MMN
mismatch negativity. See MMN
overlapping regions for language and music, 275–276
英国音乐。看英文音乐
British music. See English music
Broca’s aphasia, 285, 291–297
Broca’s area, 276, 295
保加利亚音乐,90
Bulgarian music, 90
凯奇,约翰,12 岁
Cage, John, 12
Canary Islands, 49n.21
加泰罗尼亚语音节奏,124、127、134 – 136 _
Catalan speech rhythm, 124, 127, 134–136
分类知觉
categorical perception
of pitch intervals, 24–26
节奏模式,112
of rhythm patterns, 112
语音的数量,24 – 25 , 76 , 80 – 81 , 395
of speech sounds, 24–25, 76, 80–81, 395
黑猩猩和倭黑猩猩
chimpanzees and bonobos
和语言,355
and language, 355
and music, 397, 409–410
Chinantec language, 49–50
中国人
Chinese
语。见普通话
language. See Mandarin
音乐,16、62、88、97 – 98、202。_ _ _ _ _ _ _ 另见歌剧
music, 16, 62, 88, 97–98, 202. See also opera
chords, musical, 248–250
色度。见音级
chroma. See pitch class
clicks, linguistic, 70, 74–75
先天性失乐症。见音聋
congenital amusia. See tone deafness
协和与不协和
consonance and dissonance
as a basis for musical intervals, 88–91
perception by nonhuman animals, 396–398
辅音
consonants
perception of nonnative contrasts, 68–70
place vs. manner classification, 53, 55
皮质醇,370
cortisol, 370
关键时期
critical periods
for language, 362–363
for music, 374–375
音乐的文化多样性,16 – 19 , 97 – 99 , 301 – 303
cultural diversity of music, 16–19, 97–99, 301–303
Darwin, Charles, 4, 367–368, 371
德彪西,克劳德, 20,21 n。9 , 158 – 159 , 162 , 164 – 165 , 222 , 224
Debussy, Claude, 20, 21n.9, 158–159, 162, 164–165, 222, 224
dependency-locality theory (DLT), 277–278
话语和音乐。查看语用学和音乐
discourse and music. See pragmatics and music
解离
dissociations
用于感知口语与音乐声音,73
for perceiving spoken vs. musical sounds, 73
for perceiving tonality vs. linguistic grammar, 268–270
显着特征, 38
distinctive features, 38
与语音和音乐相比,DNA 11
DNA, compared with speech and music, 11
多恩,约翰,417
Donne, John, 417
DTI (Diffusion tensor imaging), 374n.12
大象击鼓,408
elephant drumming, 408
爱德华·埃尔加,162 – 165、222、224 _
Elgar, Edward, 162–165, 222, 224
感情
emotion
acoustic cues in speech and music, 345–347, 350
和预期。见预期
and expectancy. See expectancy
experience by listeners, 315–319, 349–350
expression by music, 309–315, 350
英语
English
与韵律有关的音乐,159 – 166 , 222 – 225
music in relation to prosody, 159–166, 222–225
prosody, 119–137, 188–192
ERAN(ERP),276
ERAN (ERP), 276
ERP(事件相关电位),26 – 28 , 71 , 174 , 248 , 271 – 276 , 286 – 288 , 292 , 331 – 335
ERP (event-related potential), 26–28, 71, 174, 248, 271–276, 286–288, 292, 331–335
爱沙尼亚语
Estonian
language, 55n.26, 71
诗句,156
verse, 156
埃塞俄比亚语言和音乐,50
Ethiopian language and music, 50
事件层次结构。查看层次结构
event hierarchies. See hierarchical structure
evolution
adaptationist hypotheses for music, 368–371
language and natural selection, 358–367
music and natural selection, 367–377
预期
expectancy
谐波, 249 , 289 , 306 , 318 , 338
harmonic, 249, 289, 306, 318, 338
linguistic/pragmatic, 338–339
linguistic/prosodic, 145–147, 184
linguistic/semantic, 271, 289–290
语言/句法, 278 – 280 , 282 , 287 , 289 – 290
linguistic/syntactic, 278–280, 282, 287, 289–290
旋律, 84 , 196 – 197 , 219 , 305 – 306
melodic, 84, 196–197, 219, 305–306
and musical emotion, 308, 318
和音乐句法,242
and musical syntax, 242
有节奏的,102,108,146,155,403 – 406,415 _ _ _ _ _ _ _
rhythmic, 102, 108, 146, 155, 403–406, 415
音乐中的表现时间。看节奏
expressive timing in music. See rhythm
胎儿听觉,383
fetal auditory perception, 383
萤火虫,409
fireflies, 409
fMRI(功能磁共振成像),73、82、253、276、283、350、410 _ _ _ _ _ _
fMRI (functional magnetic resonance imaging), 73, 82, 253, 276, 283, 350, 410
外国口音
foreign accent
and speech rhythm, 97, 120, 148–149
综合症,175
syndrome, 175
formants, 56–61
FOXP2(基因),365 – 366 , 372 , 388 – 391
FOXP2 (gene), 365–366, 372, 388–391
法语
French
与韵律有关的音乐,159 – 166 , 222 – 225
music in relation to prosody, 159–166, 222–225
韵律, 119 – 124 , 126 – 138 , 192 – 193
prosody, 119–124, 126–138, 192–193
功能和谐。见和谐
functional harmony. See harmony
伽利略,文森佐,4
Galilei, Vincenzo, 4
加麦兰音乐,18、20、241、300 – 302、314、325 – 326 _ _ _ _ _ _ _ _
gamelan music, 18, 20, 241, 300–302, 314, 325–326
音调音乐生成理论( GTTM ) ,254、257、263、265。另见时间跨度减少,延长减少
Generative Theory of Tonal Music (GTTM), 254, 257, 263, 265. See also time-span reduction, prolongation reduction
遗传学
genetics
and language, 365–366, 388–391
and music, 387–395
德语
German
与韵律有关的音乐, 158 , 166 – 168 , 178 – 179
music in relation to prosody, 158, 166–168, 178–179
prosody, 133, 209
格什温,乔治,374
Gershwin, George, 374
格式塔听觉原理, 76 , 196 , 199 , 305 , 379 , 386
Gestalt auditory principles, 76, 196, 199, 305, 379, 386
加纳击鼓,98。另见非洲鼓乐
Ghanian drumming, 98. See also African drumming
gibbon song, 361n.7, 411n.23
格林卡,米哈伊尔,159
Glinka, Mikhail, 159
Greek speech rhythm, 142, 157
分组。看节奏
grouping. See rhythm
和谐
harmony
chord functions, 259–261, 265–267
五度圈,251
circle of fifths, 251
general features, 244–253
Havasupai music, 12n.2
夏威夷语,122
Hawaiian language, 122
hemispheric asymmetries, 73–76
Heschl’s gyrus, 73–74, 236
层次结构。另见延长减少;句法; 时间跨度缩减
hierarchical structure. See also prolongation reduction; syntax; time-span reduction
在语言句法中,253
in linguistic syntax, 253
musical event hierarchies, 201–202, 254–258
musical pitch hierarchies, 198–201, 245
Hindi speech sounds and music, 36, 63–67
休谟,大卫,336
Hume, David, 336
音乐唤起的意象,323
imagery evoked by music, 323
implied harmony, 202–203, 249
印度人
Indian
打鼓(tabla ),35、53、63 – 67、104 n 。 _ 2个
drumming (tabla), 35, 53, 63–67, 104n.2
musical scales, 17–18, 20
ragas and emotion, 313–314
speech sounds (vocables) in relation to music, 62–67
面向婴儿的歌唱, 344 , 370 , 377 , 379 , 381 – 382
infant-directed singing, 344, 370, 377, 379, 381–382
面向婴儿的言语, 57 , 195 , 197 , 344 , 370 , 377 , 379 , 381 – 382
infant-directed speech, 57, 195, 197, 344, 370, 377, 379, 381–382
infants
和音乐感知, 13 , 21 , 47 , 80 , 82 – 83 , 93 – 94 , 108 , 116 , 195 , 224 – 225 , 246 – 247 , 372 – 373 , 377 , 379 – 387 , 396 – 399 , 405 – 408 , 412 – 415
and music perception, 13, 21, 47, 80, 82–83, 93–94, 108, 116, 195, 224–225, 246–247, 372–373, 377, 379–387, 396–399, 405–408, 412–415
和语音感知,24、61-62、69、71、80、82、84-85、108、128-129、137、147、195、209、224-225、247、361-362、364、383 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
and speech perception, 24, 61–62, 69, 71, 80, 82, 84–85, 108, 128–129, 137, 147, 195, 209, 224–225, 247, 361–362, 364, 383
推理过程。见语用学
inferencing processes. See pragmatics
innate perceptual predispositions for music, 377–385
间隔。查看音高间隔
intervals. See pitch intervals
语调。看语音语调
intonation. See speech intonation
IPA (International Phonetic Alphabet), 38, 53, 59
等时性
isochrony
in speech perception, 143–144, 152–154
in speech production, 119–122
Ives, Charles, 9n.1
日本人
Japanese
affective speech intonation, 345n.15
musical rhythm, 161n.26, 171, 173
perception of L-R contrast, 68–70
节奏分组的感知,170
perception of rhythmic grouping, 170
语音节奏, 121 – 122 , 127, 128 , 134 , 136 – 137 , 148 , 171 – 173
speech rhythm, 121–122, 127, 128, 134, 136–137, 148, 171–173
爪哇音乐,17 – 20 , 21 n. 9 , 28 , 83 , 100 , 241 , 300 – 302 , 314 , 325 – 326 , 383 , 400
Javanese music, 17–20, 21n.9, 28, 83, 100, 241, 300–302, 314, 325–326, 383, 400
大白鲨,主题来自,328
Jaws, theme from, 328
巨昆语,44
Jukun language, 44
坎兹,409。另见黑猩猩和倭黑猩猩
Kanzi, 409. See also chimpanzees and bonobos
key, musical, 251–253
King, Martin Luther, Jr., 154 , 319
King, Martin Luther, Jr., 154, 319
基歇尔,阿塔纳修斯,16 岁
Kircher, Athanasius, 16
Klangfarbenmelodie, 34 岁
Klangfarbenmelodie, 34
leitmotif, 328–332
列维-斯特劳斯,克劳德,300
Levi-Strauss, Claude, 300
linguistic diversity, 39, 205
摇篮曲。观看针对婴儿的歌唱
lullabies. See infant-directed singing
Mandarin, 39, 47–48
意义
meaning
defined, 303–304
linguistic vs. musical, 327–342, 342–344
musical, 305–326
语义与实用,303
semantic vs. pragmatic, 303
和语法,259
and syntax, 259
MEG (脑磁图),27、106、107、174、276 _ _ _ _
MEG (magnetoencephalography), 27, 106, 107, 174, 276
melodic contour deafness hypothesis, 233–238
melodic contour perception in speech and music, 225–238
音乐中的旋律。另见预期
melody in music. See also expectancy
定义,182
defined, 182
perception, 190–204
语调的关系。看语音语调
relations to speech intonation. See speech intonation
statistical regularities, 218–221
structural relations within, 190–204
演讲中的旋律。看语音语调
melody in speech. See speech intonation
仪表。看节奏
meter. See rhythm
格律网格
metrical grid
differences between musical and linguistic, 141, 403
linguistic, 139–140
musical, 103–105, 157
metrical phonology, 138–142
microtones, 17, 18
错配消极性。见MMN
mismatch negativity. See MMN
缺少基础,87
missing fundamental, 87
MMN (mismatch negativity), 26–28, 71, 85
猴子
monkeys
perception of music by, 396–399
母语。参见音乐中以婴儿为导向的言语运动,117、319 – 320
motherese. See infant-directed speech motion in music, 117, 319–320
莫扎特,沃尔夫冈·阿玛迪斯,310、320、321 – 322、380 _
Mozart, Wolfgang Amadeus, 310, 320, 321–322, 380
myelination and music, 237, 374–375
N400 (ERP), 271 – 272 , 286 , 331 – 335
N400 (ERP), 271–272, 286, 331–335
N5(企业资源计划),335
N5 (ERP), 335
neuroimaging evidence for overlap in music and language processing, 271–276
neuropsychology of music-language dissociations, 268–270
New Guinea, 11, 51, 364, 371
Nicaraguan sign language, 365, 369
nPVI, 131 – 135 , 137名词。19 , 149 , 153 , 161 – 168 , 176名词。29 , 177 – 179 , 223 – 224
nPVI, 131–135, 137n.19, 149, 153, 161–168, 176n.29, 177–179, 223–224
八度
octave
定义,13
defined, 13
等效,13
equivalence, 13
and musical scales, 14–15, 15n.5
歌剧
opera
粤语, 217
Cantonese, 217
法语, 158
French, 158
主旋律。参见主旋律
leitmotif. See leitmotif
P600 (ERP), 253 , 271 – 272 , 274 – 275 , 284
P600 (ERP), 253, 271–272, 274–275, 284
帕金森病,103
Parkinson’s disease, 103
鹦鹉和音乐节奏,411
parrots and musical rhythm, 411
Partch, Harry, 17n.7
particulate vs. holistic sound systems, 10–11
知觉同化模型(PAM),70知觉磁铁效应
perceptual assimilation model (PAM), 70 perceptual magnet effect
in music, 81–82
神经机制,82
neural mechanisms, 82
in speech, 80–81
perfect pitch. See absolute pitch phoneme, 38, 51–52
phonetics vs. phonology, 37–38
phonological skills and musical abilities, 78–80
音位学
phonology
的语调。看语音语调
of intonation. See speech intonation
的节奏。参见韵律音系
of rhythm. See metrical phonology
皮拉罕语, 3
Pirahã, 3
沥青
pitch
班级,13
class, 13
定义,12
defined, 12
helix, 13–14
间隔
intervals
as learned sound categories in music, 22–28
theories for special perceptual qualities, 88–93
的多维性质,13
multidimensional nature of, 13
Plato, 3–4, 315, 344
诗歌
poetry
intonation, 206n.10
韵律。看节奏
rhythm. See rhythm
波兰语语音节奏, 124 , 127 – 128 , 134 , 135 – 137 , 168
Polish speech rhythm, 124, 127–128, 134, 135–137, 168
葡萄牙语语音节奏,128
Portuguese speech rhythm, 128
pragmatics and music, 335–342. See also meaning
延长减少, 240 , 257 – 259 , 263 , 265
prolongation reduction, 240, 257–259, 263, 265
韵律, 22 , 119 , 149 , 159 , 161 , 163 , 175 , 191 , 222 – 223 , 225 , 231 , 344 – 345 , 347 – 349。
prosody, 22, 119, 149, 159, 161, 163, 175, 191, 222–223, 225, 231, 344–345, 347–349.
另见节奏;说话语调
See also rhythm; speech intonation
pure word deafness, 75, 270, 386
Pythagoras, 15, 19
盖丘亚语和音乐,169
Quechua language and music, 169
ragas, 17, 18, 313–314
Rameau, Jean-Philippe, 159, 259
阅读能力和音乐,78
reading abilities and music, 78
递归结构,139 n。22 , 243 , 265 , 267 , 395
recursive structure, 139n.22, 243, 265, 267, 395
帝国史蒂夫,86 岁
Reich, Steve, 86
语言和音乐资源共享。参见SSIRH
resource-sharing in language and music. See SSIRH
韵律
rhythm
beat, 99–103, 402–411
定义, 96
defined, 96
disorders, 175, 404
expressive timing, 114–116
grouping, 106–112, 168–174
linguistic 118–154
typology, 119–138
仪表
meter
linguistic, 138–142, 403
musical, 103–106, 403
and movement, 99–100, 117, 415
musical, 97–117
语音与音乐中的神经处理,173 – 176 , 404 – 405 , 410
neural processing in speech vs. music, 173–176, 404–405, 410
婴儿的感知,128 – 129 , 137 , 173 , 405 – 408 , 412 – 415
perception by infants, 128–129, 137, 173, 405–408, 412–415
periodic vs. nonperiodic, 96, 159
poetic, 154–156
心理维度,117
psychological dimensions, 117
of speech reflected in music, 159–168
言语反映在非语言
of speech reflected in nonlinguistic
rhythm perception, 168–173
和同步, 99 – 103 , 402 – 405 , 408 – 411
and synchronization, 99–103, 402–405, 408–411
Western European vs. other traditions, 97–98
Riemann, Hugo, 266–267
卢梭,让-雅克,4
Rousseau, Jean-Jacques, 4
音阶,音乐
scales, musical
不对称与对称,20 – 21 , 245 , 248 , 380
asymmetric vs. symmetric, 20–21, 245, 248, 380
cultural diversity and commonality, 16–22
general features 14–16
Western vs. Indian, 17–18
Western vs. Javanese, 18–19, 83
全音,20
whole-tone, 20
申克,海因里希,201,254,307
Schenker, Heinrich, 201, 254, 307
Schoenberg, Arnold, 34, 86
second language (L2) proficiency and musical ability, 78–79
语音分割问题,61、84、147 – 148、169 _ _ _
segmentation problem for speech, 61, 84, 147–148, 169
semantics and music, 327–335. See also meaning
semitone, 15, 87
sexual selection, 368–369
共同意向性,359
shared intentionality, 359
共享声音类别学习机制假设 (SSCLMH),72
shared sound category learning mechanism hypothesis (SSCLMH), 72
共享句法整合资源假设 (SSIRH), 268 , 283 – 297
shared syntactic integration resource hypothesis (SSIRH), 268, 283–297
Shebalin, Vissarion, 270
Shebalin, Vissarion, 270
sign language, 363–364. See also Nicaraguan sign language
西蒙,保罗,343
Simon, Paul, 343
sine-wave speech, 76–77
歌曲
song
relations between linguistic and musical meaning, 342–344
relations between linguistic and musical melody, 216–218
relations between linguistic and musical rhythm, 156–159
声音
sound
category learning in speech and music, 71–85
颜色。看音色
color. See timbre
感知维度,12
perceptual dimensions, 12
symbolism, 62–67
西班牙语
Spanish
musical rhythm, 166, 178–179
语音节奏, 119 – 120 , 122 – 123 , 127 , 128 , 131 , 133 – 134 , 136 – 138 , 148 – 149 , 152 , 154
speech rhythm, 119–120, 122–123, 127, 128, 131, 133–134, 136–138, 148–149, 152, 154
spectrogram, 60–63
语音声学。参见辅音;元音
speech acoustics. See consonants; vowels
说话语调
speech intonation
affective, in relation to music, 345–347
autosegmental-metrical theory, 188, 191, 206–211
与音乐旋律的差异, 183 – 185 , 204 – 205
differences from musical melody, 183–185, 204–205
fundamental frequency (F0), 185–186
linguistic vs. affective, 183, 185–186
perception by normal individuals, 211–216
perception by tone-deaf individuals, 228–233
reflected in instrumental music, 218–225
reflected in vocal music, 216–218
speech mode of perception, 76–77
言语替代品。见会说话的鼓;口哨讲话
speech surrogates. See talking drums; whistled speech
Spencer, Herbert, 317n.7, 345
嘘。见共享句法整合资源假说
SSIRH. See shared syntactic integration resource hypothesis
统计学习
statistical learning
定义, 84
defined, 84
of pitch patterns, 220, 224, 385
节奏模式,165
of rhythm patterns, 165
分割边界,84,224,395 _ _
of segmentation boundaries, 84, 224, 395
of sound categories, 81, 84–85, 385
Steele, Joshua, 182, 186–187, 211
理查德·施特劳斯,158 岁
Strauss, Richard, 158
斯特拉文斯基,伊戈尔,393
Stravinsky, Igor, 393
压力。另见音节
stress. See also syllable
耳聋,138
deafness, 138
shift, 126, 140–142
timing, 118–126
superior temporal gyrus, 73–74
音节
syllable
定义, 39
defined, 39
有压力和无压力,37 – 38 , 100 , 110 , 111 n. 6 , 118 – 120 , 123 , 139 – 140
stressed and unstressed, 37–38, 100, 110, 111n.6, 118–120, 123, 139–140
timing, 118–126
同步
synchronization
in animal displays vs. human music, 408–409
到一个节拍。看节奏
to a beat. See rhythm
句法
syntax
定义,241
defined, 241
interference between musical and linguistic processing of, 285–290
musical, 242–262
musical vs. linguistic, 262–267
shared neural resources for musical and linguistic, 282–284
tabla, 34–37, 63–67
talking drums, 48–49, 77n.30
Ticuna language, 12, 42
音色
timbre
linguistic, 50–60
mapped onto musical sounds, 62–67
musical, 28–37
rarity as a basis for musical systems, 30–34
时间跨度缩减,240,254 – 256,258,263 _ _ _
time-span reduction, 240, 254–256, 258, 263
tonal pitch space (TPS) theory, 280–282
调性
tonality
定义, 198
defined, 198
disorders of perception, 268–269
in music, 198–203, 242, 260
perception by animals, 398–400
perception in aphasia, 290–297
音聋,音乐
tone deafness, musical
defined, 228–229
genetics, 391–393
和节奏,176
and rhythm, 176
and speech intonation perception, 228–238
声调语言
tone languages
and absolute pitch, 46–48
定义, 40
defined, 40
level tones (vs. contour tones), 41–46
mapped onto musical instruments, 48–50
songs, 216–217
tone spacing, 42–45, 94
色调绘画,320
tone painting, 320
topics, musical, 321–322
trance, 324–325
translating music, 300–303
共性
universals
of auditory perception, 170, 173, 305
语言学, 40 , 119 , 142 , 148 , 206 , 242 , 276 , 367
linguistic, 40, 119, 142, 148, 206, 242, 276, 367
音乐剧, 11 – 13 , 19 , 93 , 97 – 98 , 196 , 259 , 301 , 305 , 312 – 315 , 357 , 367 , 376 , 398 – 401
musical, 11–13, 19, 93, 97–98, 196, 259, 301, 305, 312–315, 357, 367, 376, 398–401
范古力克,罗伯特,97 岁
van Gulik, Robert, 97
黑长尾猴,10只
Vervet monkey, 10
vocables, 36, 62–67
vocal learning, 361, 410–411
vocal tract anatomy, 360–361
元音
vowels
大脑对本地与非本地对比的反应,71
brain responses to native vs. nonnative contrasts, 71
IPA 图表,59
IPA chart, 59
production and acoustics, 54–61
减少和语音节奏, 123 – 125 , 129 , 131 , 134 , 149 – 150 , 161 – 162
reduction and speech rhythm, 123–125, 129, 131, 134, 149–150, 161–162
Wagner, Richard, 328–332, 393
韦尼克区,276
Wernicke’s area, 276
鲸歌, 10 , 244 , 259 , 265 , 355 , 368
whale song, 10, 244, 259, 265, 355, 368
whistled speech, 49–50
维特根斯坦,路德维希,4
Wittgenstein, Ludwig, 4
词聋。见纯字耳聋
word deafness. See pure word deafness
木琴调音灵活性,18
xylophone tuning flexibility, 18
祖鲁咔嗒声。查看点击次数,语言
Zulu click sounds. See clicks, linguistic